Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
Glycobiology ; 33(5): 354-357, 2023 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-36799723

RESUMO

Recent technological advances in glycobiology have resulted in a large influx of data and the publication of many papers describing discoveries in glycoscience. However, the terms used in describing glycan structural features are not standardized, making it difficult to harmonize data across biomolecular databases, hampering the harvesting of information across studies and hindering text mining and curation efforts. To address this shortcoming, the Glycan Structure Dictionary has been developed as a reference dictionary to provide a standardized list of widely used glycan terms that can help in the curation and mapping of glycan structures described in publications. Currently, the dictionary has 190 glycan structure terms with 297 synonyms linked to 3,332 publications. For a term to be included in the dictionary, it must be present in at least 2 peer-reviewed publications. Synonyms, annotations, and cross-references to GlyTouCan, GlycoMotif, and other relevant databases and resources are also provided when available. The purpose of this effort is to facilitate biocuration, assist in the development of text mining tools, improve the harmonization of search, and browse capabilities in glycoinformatics resources and help to map glycan structures to function and disease. It is also expected that authors will use these terms to describe glycan structures in their manuscripts over time. A mechanism is also provided for researchers to submit terms for potential incorporation. The dictionary is available at https://wiki.glygen.org/Glycan_structure_dictionary.


Assuntos
Mineração de Dados , Polissacarídeos , Mineração de Dados/métodos , Bases de Dados Factuais , Polissacarídeos/química , Glicômica/métodos
2.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34015823

RESUMO

In response to the COVID-19 outbreak, scientists and medical researchers are capturing a wide range of host responses, symptoms and lingering postrecovery problems within the human population. These variable clinical manifestations suggest differences in influential factors, such as innate and adaptive host immunity, existing or underlying health conditions, comorbidities, genetics and other factors-compounding the complexity of COVID-19 pathobiology and potential biomarkers associated with the disease, as they become available. The heterogeneous data pose challenges for efficient extrapolation of information into clinical applications. We have curated 145 COVID-19 biomarkers by developing a novel cross-cutting disease biomarker data model that allows integration and evaluation of biomarkers in patients with comorbidities. Most biomarkers are related to the immune (SAA, TNF-∝ and IP-10) or coagulation (D-dimer, antithrombin and VWF) cascades, suggesting complex vascular pathobiology of the disease. Furthermore, we observe commonality with established cancer biomarkers (ACE2, IL-6, IL-4 and IL-2) as well as biomarkers for metabolic syndrome and diabetes (CRP, NLR and LDL). We explore these trends as we put forth a COVID-19 biomarker resource (https://data.oncomx.org/covid19) that will help researchers and diagnosticians alike.

3.
Glycobiology ; 32(10): 855-870, 2022 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-35925813

RESUMO

Molecular biomarkers measure discrete components of biological processes that can contribute to disorders when impaired. Great interest exists in discovering early cancer biomarkers to improve outcomes. Biomarkers represented in a standardized data model, integrated with multi-omics data, may improve the understanding and use of novel biomarkers such as glycans and glycoconjugates. Among altered components in tumorigenesis, N-glycans exhibit substantial biomarker potential, when analyzed with their protein carriers. However, such data are distributed across publications and databases of diverse formats, which hamper their use in research and clinical application. Mass spectrometry measures of 50 N-glycans on 7 serum proteins in liver disease were integrated (as a panel) into a cancer biomarker data model, providing a unique identifier, standard nomenclature, links to glycan resources, and accession and ontology annotations to standard protein, gene, disease, and biomarker information. Data provenance was documented with a standardized United States Food and Drug Administration-supported BioCompute Object. Using the biomarker data model allows the capture of granular information, such as glycans with different levels of abundance in cirrhosis, hepatocellular carcinoma, and transplant groups. Such representation in a standardized data model harmonizes glycomics data in a unified framework, making glycan-protein biomarker data exploration more available to investigators and to other data resources. The biomarker data model we describe can be used by researchers to describe their novel glycan and glycoconjugate biomarkers; it can integrate N-glycan biomarker data with multi-source biomedical data and can foster discovery and insight within a unified data framework for glycan biomarker representation, thereby making the data FAIR (Findable, Accessible, Interoperable, Reusable) (https://www.go-fair.org/fair-principles/).


Assuntos
Carcinoma Hepatocelular , Neoplasias Hepáticas , Biomarcadores , Biomarcadores Tumorais , Carcinoma Hepatocelular/diagnóstico , Glicômica/métodos , Humanos , Neoplasias Hepáticas/diagnóstico , Polissacarídeos/química
4.
Glycobiology ; 31(11): 1510-1519, 2021 12 18.
Artigo em Inglês | MEDLINE | ID: mdl-34314492

RESUMO

Glycans play a vital role in health, disease, bioenergy, biomaterials and bio-therapeutics. As a result, there is keen interest to identify and increase glycan data in bioinformatics databases like ChEBI and PubChem, and connecting them to resources at the EMBL-EBI and NCBI to facilitate access to important annotations at a global level. GlyTouCan is a comprehensive archival database that contains glycans obtained primarily through batch upload from glycan repositories, glycoprotein databases and individual laboratories. In many instances, the glycan structures deposited in GlyTouCan may not be fully defined or have supporting experimental evidence and citations. Databases like ChEBI and PubChem were designed to accommodate complete atomistic structures with well-defined chemical linkages. As a result, they cannot easily accommodate the structural ambiguity inherent in glycan databases. Consequently, there is a need to improve the organization of glycan data coherently to enhance connectivity across the major NCBI, EMBL-EBI and glycoscience databases. This paper outlines a workflow developed in collaboration between GlyGen, ChEBI and PubChem to improve the visibility and connectivity of glycan data across these resources. GlyGen hosts a subset of glycans (~29,000) from the GlyTouCan database and has submitted valuable glycan annotations to the PubChem database and integrated over 10,500 (including ambiguously defined) glycans into the ChEBI database. The integrated glycans were prioritized based on links to PubChem and connectivity to glycoprotein data. The pipeline provides a blueprint for how glycan data can be harmonized between different resources. The current PubChem, ChEBI and GlyTouCan mappings can be downloaded from GlyGen (https://data.glygen.org).


Assuntos
Bases de Dados de Compostos Químicos , Glicoproteínas/química , Polissacarídeos/química , Software , Configuração de Carboidratos , Glicômica
5.
Bioinformatics ; 36(12): 3941-3943, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32324859

RESUMO

SUMMARY: Glycoinformatics plays a major role in glycobiology research, and the development of a comprehensive glycoinformatics knowledgebase is critical. This application note describes the GlyGen data model, processing workflow and the data access interfaces featuring programmatic use case example queries based on specific biological questions. The GlyGen project is a data integration, harmonization and dissemination project for carbohydrate and glycoconjugate-related data retrieved from multiple international data sources including UniProtKB, GlyTouCan, UniCarbKB and other key resources. AVAILABILITY AND IMPLEMENTATION: GlyGen web portal is freely available to access at https://glygen.org. The data portal, web services, SPARQL endpoint and GitHub repository are also freely available at https://data.glygen.org, https://api.glygen.org, https://sparql.glygen.org and https://github.com/glygener, respectively. All code is released under license GNU General Public License version 3 (GNU GPLv3) and is available on GitHub https://github.com/glygener. The datasets are made available under Creative Commons Attribution 4.0 International (CC BY 4.0) license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bases de Conhecimento , Software , Glicômica , Armazenamento e Recuperação da Informação , Fluxo de Trabalho
6.
PLoS Biol ; 16(12): e3000099, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30596645

RESUMO

A personalized approach based on a patient's or pathogen's unique genomic sequence is the foundation of precision medicine. Genomic findings must be robust and reproducible, and experimental data capture should adhere to findable, accessible, interoperable, and reusable (FAIR) guiding principles. Moreover, effective precision medicine requires standardized reporting that extends beyond wet-lab procedures to computational methods. The BioCompute framework (https://w3id.org/biocompute/1.3.0) enables standardized reporting of genomic sequence data provenance, including provenance domain, usability domain, execution domain, verification kit, and error domain. This framework facilitates communication and promotes interoperability. Bioinformatics computation instances that employ the BioCompute framework are easily relayed, repeated if needed, and compared by scientists, regulators, test developers, and clinicians. Easing the burden of performing the aforementioned tasks greatly extends the range of practical application. Large clinical trials, precision medicine, and regulatory submissions require a set of agreed upon standards that ensures efficient communication and documentation of genomic analyses. The BioCompute paradigm and the resulting BioCompute Objects (BCOs) offer that standard and are freely accessible as a GitHub organization (https://github.com/biocompute-objects) following the "Open-Stand.org principles for collaborative open standards development." With high-throughput sequencing (HTS) studies communicated using a BCO, regulatory agencies (e.g., Food and Drug Administration [FDA]), diagnostic test developers, researchers, and clinicians can expand collaboration to drive innovation in precision medicine, potentially decreasing the time and cost associated with next-generation sequencing workflow exchange, reporting, and regulatory reviews.


Assuntos
Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Animais , Comunicação , Biologia Computacional/normas , Genoma , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Medicina de Precisão/tendências , Reprodutibilidade dos Testes , Análise de Sequência de DNA/normas , Software , Fluxo de Trabalho
7.
Nucleic Acids Res ; 46(D1): D1128-D1136, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-30053270

RESUMO

Single-nucleotide variation and gene expression of disease samples represent important resources for biomarker discovery. Many databases have been built to host and make available such data to the community, but these databases are frequently limited in scope and/or content. BioMuta, a database of cancer-associated single-nucleotide variations, and BioXpress, a database of cancer-associated differentially expressed genes and microRNAs, differ from other disease-associated variation and expression databases primarily through the aggregation of data across many studies into a single source with a unified representation and annotation of functional attributes. Early versions of these resources were initiated by pilot funding for specific research applications, but newly awarded funds have enabled hardening of these databases to production-level quality and will allow for sustained development of these resources for the next few years. Because both resources were developed using a similar methodology of integration, curation, unification, and annotation, we present BioMuta and BioXpress as allied databases that will facilitate a more comprehensive view of gene associations in cancer. BioMuta and BioXpress are hosted on the High-performance Integrated Virtual Environment (HIVE) server at the George Washington University at https://hive.biochemistry.gwu.edu/biomuta and https://hive.biochemistry.gwu.edu/bioxpress, respectively.


Assuntos
Biomarcadores Tumorais/genética , Bases de Dados Genéticas , Bases de Conhecimento , Mutação , Neoplasias/genética , Regulação Neoplásica da Expressão Gênica , Humanos , MicroRNAs , Interface Usuário-Computador
8.
Plant Physiol ; 175(3): 1350-1369, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28899960

RESUMO

Drought stress is one of the main environmental problems encountered by crop growers. Reduction in arable land area and reduced water availability make it paramount to identify and develop strategies to allow crops to be more resilient in water-limiting environments. The plant hormone abscisic acid (ABA) plays an important role in the plants' response to drought stress through its control of stomatal aperture and water transpiration, and transgenic modulation of ABA levels therefore represents an attractive avenue to improve the drought tolerance of crops. Several steps in the ABA-signaling pathway are controlled by ubiquitination involving really interesting new genes (RING) domain-containing proteins. We characterized the maize (Zea mays) RING protein family and identified two novel RING-H2 genes called ZmXerico1 and ZmXerico2 Expression of ZmXerico genes is induced by drought stress, and we show that overexpression of ZmXerico1 and ZmXerico2 in Arabidopsis and maize confers ABA hypersensitivity and improved water use efficiency, which can lead to enhanced maize yield performance in a controlled drought-stress environment. Overexpression of ZmXerico1 and ZmXerico2 in maize results in increased ABA levels and decreased levels of ABA degradation products diphaseic acid and phaseic acid. We show that ZmXerico1 is localized in the endoplasmic reticulum, where ABA 8'-hydroxylases have been shown to be localized, and that it functions as an E3 ubiquitin ligase. We demonstrate that ZmXerico1 plays a role in the control of ABA homeostasis through regulation of ABA 8'-hydroxylase protein stability, representing a novel control point in the regulation of the ABA pathway.


Assuntos
Ácido Abscísico/metabolismo , Adaptação Fisiológica , Secas , Homeostase , Domínios RING Finger , Ubiquitina-Proteína Ligases/química , Ubiquitina-Proteína Ligases/metabolismo , Zea mays/fisiologia , Adaptação Fisiológica/genética , Sequência de Aminoácidos , Arabidopsis/fisiologia , Ritmo Circadiano/genética , Sequência Consenso , Desidratação , Retículo Endoplasmático/metabolismo , Estabilidade Enzimática , Regulação da Expressão Gênica de Plantas , Genes de Plantas , Proteínas de Fluorescência Verde/metabolismo , Família Multigênica , Folhas de Planta/metabolismo , Proteínas de Plantas/química , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Raízes de Plantas/metabolismo , Estômatos de Plantas/fisiologia , Plantas Geneticamente Modificadas , Ligação Proteica , Protoplastos/metabolismo , Proteínas Recombinantes de Fusão/metabolismo , Sementes/crescimento & desenvolvimento , Estresse Fisiológico , Zea mays/enzimologia , Zea mays/genética
10.
Sci Data ; 8(1): 25, 2021 01 21.
Artigo em Inglês | MEDLINE | ID: mdl-33479245

RESUMO

Over the past 35 years, ~1700 articles have characterized protein O-GlcNAcylation. Found in almost all living organisms, this post-translational modification of serine and threonine residues is highly conserved and key to biological processes. With half of the primary research articles using human models, the O-GlcNAcome recently reached a milestone of 5000 human proteins identified. Herein, we provide an extensive inventory of human O-GlcNAcylated proteins, their O-GlcNAc sites, identification methods, and corresponding references ( www.oglcnac.mcw.edu ). In the absence of a comprehensive online resource for O-GlcNAcylated proteins, this list serves as the only database of O-GlcNAcylated proteins. Based on the thorough analysis of the amino acid sequence surrounding 7002 O-GlcNAc sites, we progress toward a more robust semi-consensus sequence for O-GlcNAcylation. Moreover, we offer a comprehensive meta-analysis of human O-GlcNAcylated proteins for protein domains, cellular and tissue distribution, and pathways in health and diseases, reinforcing that O-GlcNAcylation is a master regulator of cell signaling, equal to the widely studied phosphorylation.


Assuntos
Bases de Dados de Proteínas , Glicoproteínas , Glicosilação , Humanos , Processamento de Proteína Pós-Traducional
11.
Database (Oxford) ; 20212021 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-33784373

RESUMO

Developments in high-throughput sequencing (HTS) result in an exponential increase in the amount of data generated by sequencing experiments, an increase in the complexity of bioinformatics analysis reporting and an increase in the types of data generated. These increases in volume, diversity and complexity of the data generated and their analysis expose the necessity of a structured and standardized reporting template. BioCompute Objects (BCOs) provide the requisite support for communication of HTS data analysis that includes support for workflow, as well as data, curation, accessibility and reproducibility of communication. BCOs standardize how researchers report provenance and the established verification and validation protocols used in workflows while also being robust enough to convey content integration or curation in knowledge bases. BCOs that encapsulate tools, platforms, datasets and workflows are FAIR (findable, accessible, interoperable and reusable) compliant. Providing operational workflow and data information facilitates interoperability between platforms and incorporation of future dataset within an HTS analysis for use within industrial, academic and regulatory settings. Cloud-based platforms, including High-performance Integrated Virtual Environment (HIVE), Cancer Genomics Cloud (CGC) and Galaxy, support BCO generation for users. Given the 100K+ userbase between these platforms, BioCompute can be leveraged for workflow documentation. In this paper, we report the availability of platform-dependent and platform-independent BCO tools: HIVE BCO App, CGC BCO App, Galaxy BCO API Extension and BCO Portal. Community engagement was utilized to evaluate tool efficacy. We demonstrate that these tools further advance BCO creation from text editing approaches used in earlier releases of the standard. Moreover, we demonstrate that integrating BCO generation within existing analysis platforms greatly streamlines BCO creation while capturing granular workflow details. We also demonstrate that the BCO tools described in the paper provide an approach to solve the long-standing challenge of standardizing workflow descriptions that are both human and machine readable while accommodating manual and automated curation with evidence tagging. Database URL:  https://www.biocomputeobject.org/resources.


Assuntos
Biologia Computacional , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Reprodutibilidade dos Testes , Software , Fluxo de Trabalho
12.
JCO Clin Cancer Inform ; 4: 210-220, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-32142370

RESUMO

PURPOSE: The purpose of OncoMX1 knowledgebase development was to integrate cancer biomarker and relevant data types into a meta-portal, enabling the research of cancer biomarkers side by side with other pertinent multidimensional data types. METHODS: Cancer mutation, cancer differential expression, cancer expression specificity, healthy gene expression from human and mouse, literature mining for cancer mutation and cancer expression, and biomarker data were integrated, unified by relevant biomedical ontologies, and subjected to rule-based automated quality control before ingestion into the database. RESULTS: OncoMX provides integrated data encompassing more than 1,000 unique biomarker entries (939 from the Early Detection Research Network [EDRN] and 96 from the US Food and Drug Administration) mapped to 20,576 genes that have either mutation or differential expression in cancer. Sentences reporting mutation or differential expression in cancer were extracted from more than 40,000 publications, and healthy gene expression data with samples mapped to organs are available for both human genes and their mouse orthologs. CONCLUSION: OncoMX has prioritized user feedback as a means of guiding development priorities. By mapping to and integrating data from several cancer genomics resources, it is hoped that OncoMX will foster a dynamic engagement between bioinformaticians and cancer biomarker researchers. This engagement should culminate in a community resource that substantially improves the ability and efficiency of exploring cancer biomarker data and related multidimensional data.


Assuntos
Biomarcadores Tumorais/análise , Biologia Computacional/métodos , Mineração de Dados/métodos , Bases de Dados Genéticas/normas , Bases de Conhecimento , Neoplasias/diagnóstico , Software , Animais , Ontologias Biológicas , Humanos , Camundongos , Neoplasias/terapia , Interface Usuário-Computador
13.
PLoS One ; 14(4): e0213770, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30934003

RESUMO

Human endogenous retroviruses (HERVs) have been investigated for potential links with human cancer. However, the distribution of somatic nucleotide variations in HERV elements has not been explored in detail. This study aims to identify HERV elements with an over-representation of somatic mutations (hot spots) in cancer patients. Four HERV elements with mutation hotspots were identified that overlap with exons of four human protein coding genes. These hotspots were identified based on the significant over-representation (p<8.62e-4) of non-synonymous single-nucleotide variations (nsSNVs). These genes are TNN (HERV-9/LTR12), OR4K15 (HERV-IP10F/LTR10F), ZNF99 (HERV-W/HERV17/LTR17), and KIR2DL1 (MST/MaLR). In an effort to identify mutations that effect survival, all nsSNVs were further evaluated and it was found that kidney cancer patients with mutation C2270G in ZNF99 have a significantly lower survival rate (hazard ratio = 2.6) compared to those without it. Among HERV elements in the human non-protein coding regions, we found 788 HERVs with significantly elevated numbers of somatic single-nucleotide variations (SNVs) (p<1.60e-5). From this category the top three HERV elements with significantly over-represented SNVs are HERV-H/LTR7, HERV-9/LTR12 and HERV-L/MLT2. Majority of the SNVs in these 788 HERV elements are located in three DNA functional groups: long non-coding RNAs (lncRNAs) (60%), introns (22.2%) and transcriptional factor binding sites (TFBS) (14.8%). This study provides a list of mutational hotspots in HERVs, which could potentially be used as biomarkers and therapeutic targets.


Assuntos
Retrovirus Endógenos/genética , Genoma Humano/genética , Neoplasias Renais/genética , Polimorfismo de Nucleotídeo Único/genética , Éxons/genética , Regulação Neoplásica da Expressão Gênica , Humanos , Íntrons/genética , Neoplasias Renais/patologia , Mutação , RNA Longo não Codificante/genética , Receptores KIR2DL1/genética , Análise de Sobrevida , Tenascina/genética , Sequências Repetidas Terminais/genética
14.
Comput Biol Med ; 103: 183-197, 2018 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-30384176

RESUMO

microRNAs (miRNAs) functioning in gene silencing have been associated with cancer progression. However, common abnormal miRNA expression patterns and their potential roles in cancer have not yet been evaluated. To account for individual differences between patients, we retrieved miRNA sequencing data for 575 patients with both tumor and adjacent non-tumorous tissues from 14 cancer types from The Cancer Genome Atlas (TCGA). We then performed differential expression analysis using DESeq2 and edgeR. Results showed that cancer types can be grouped based on the distribution of miRNAs with different expression patterns between tumor and non-tumor samples. We found 81 significantly differentially expressed miRNAs (SDEmiRNAs) in a single cancer. We also found 21 key SDEmiRNAs (nine over-expressed and 12 under-expressed) associated with at least eight cancers each and enriched in more than 60% of patients per cancer, including four newly identified SDEmiRNAs (hsa-mir-4746, hsa-mir-3648, hsa-mir-3687, and hsa-mir-1269a). The downstream effects of these 21 SDEmiRNAs on cellular function were evaluated through enrichment and pathway analysis of 7186 protein-coding gene targets mined from literature reports of differential expression of miRNAs in cancer. This analysis enables identification of SDEmiRNA functional similarity in cell proliferation control across a wide range of cancers, and assembly of common regulatory networks over cancer-related pathways. These findings were validated by construction of a regulatory network in the PI3K pathway. This study provides evidence for the value of further analysis of SDEmiRNAs as potential biomarkers and therapeutic targets for cancer diagnosis and treatment.


Assuntos
Perfilação da Expressão Gênica/métodos , Genômica/métodos , MicroRNAs/genética , Neoplasias/genética , Regulação Neoplásica da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Humanos , MicroRNAs/análise , MicroRNAs/metabolismo , MicroRNAs/fisiologia , Neoplasias/metabolismo , Neoplasias/mortalidade , Neoplasias/fisiopatologia
15.
Methods Mol Biol ; 694: 91-105, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21082430

RESUMO

The rapid growth of protein sequence databases has necessitated the development of methods to computationally derive annotation for uncharacterized entries. Most such methods focus on "global" annotation, such as molecular function or biological process. Methods to supply high-accuracy "local" annotation to functional sites based on structural information at the level of individual amino acids are relatively rare. In this chapter we will describe a method we have developed for annotation of functional residues within experimentally-uncharacterized proteins that relies on position-specific site annotation rules (PIR Site Rules) derived from structural and experimental information. These PIR Site Rules are manually defined to allow for conditional propagation of annotation. Each rule specifies a tripartite set of conditions whereby candidates for annotation must pass a whole-protein classification test (that is, have end-to-end match to a whole-protein-based HMM), match a site-specific profile HMM and, finally, match functionally and structurally characterized residues of a template. Positive matches trigger the appropriate annotation for active site residues, binding site residues, modified residues, or other functionally important amino acids. The strict criteria used in this process have rendered high-confidence annotation suitable for UniProtKB/Swiss-Prot features.


Assuntos
Aminoácidos/química , Biologia Computacional/métodos , Bases de Dados de Proteínas , Bases de Conhecimento , Anotação de Sequência Molecular/métodos , Proteínas/química , Sequência de Aminoácidos , Coproporfirinogênio Oxidase/química , Coproporfirinogênio Oxidase/metabolismo , Escherichia coli/metabolismo , Dados de Sequência Molecular , Tiorredoxinas/química , Tiorredoxinas/metabolismo
16.
Bioinformatics ; 21(9): 1853-8, 2005 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-15691854

RESUMO

MOTIVATION: Knowledge of the transmembrane helical topology can help identify binding sites and infer functions for membrane proteins. However, because membrane proteins are hard to solubilize and purify, only a very small amount of membrane proteins have structure and topology experimentally determined. This has motivated various computational methods for predicting the topology of membrane proteins. RESULTS: We present an improved hidden Markov model, TMMOD, for the identification and topology prediction of transmembrane proteins. Our model uses TMHMM as a prototype, but differs from TMHMM by the architecture of the submodels for loops on both sides of the membrane and also by the model training procedure. In cross-validation experiments using a set of 83 transmembrane proteins with known topology, TMMOD outperformed TMHMM and other existing methods, with an accuracy of 89% for both topology and locations. In another experiment using a separate set of 160 transmembrane proteins, TMMOD had 84% for topology and 89% for locations. When utilized for identifying transmembrane proteins from non-transmembrane proteins, particularly signal peptides, TMMOD has consistently fewer false positives than TMHMM does. Application of TMMOD to a collection of complete genomes shows that the number of predicted membrane proteins accounts for approximately 20-30% of all genes in those genomes, and that the topology where both the N- and C-termini are in the cytoplasm is dominant in these organisms except for Caenorhabditis elegans. AVAILABILITY: http://liao.cis.udel.edu/website/servers/TMMOD/


Assuntos
Algoritmos , Inteligência Artificial , Mapeamento Cromossômico/métodos , Proteínas de Membrana/química , Proteínas de Membrana/genética , Modelos Químicos , Modelos Moleculares , Sequência de Aminoácidos , Simulação por Computador , Cadeias de Markov , Proteínas de Membrana/análise , Modelos Estatísticos , Dados de Sequência Molecular , Conformação Proteica , Homologia de Sequência de Aminoácidos , Software
17.
Bioinformatics ; 21(10): 2287-93, 2005 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-15797916

RESUMO

A simple approach for the sensitive detection of distant relationships among protein families and for sequence-structure alignment via comparison of hidden Markov models based on their quasi-consensus sequences is presented. Using a previously published benchmark dataset, the approach is demonstrated to give better homology detection and yield alignments with improved accuracy in comparison to an existing state-of-the-art dynamic programming profile-profile comparison method. This method also runs significantly faster and is therefore suitable for a server covering the rapidly increasing structure database. A server based on this method is available at http://liao.cis.udel.edu/website/servers/modmod


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Modelos Químicos , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Sequência Consenso , Cadeias de Markov , Modelos Biológicos , Modelos Estatísticos , Dados de Sequência Molecular , Homologia de Sequência de Aminoácidos
18.
Bioinformatics ; 18(3): 496-7, 2002 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-11934755

RESUMO

SUMMARY: A public server for evaluating the accuracy of protein sequence alignment methods is presented. CASA is an implementation of the alignment accuracy benchmark presented by Sauder et al. (Proteins, 40, 6-22, 2000). The benchmark currently contains 39321 pairwise protein structure alignments produced with the CE program from SCOP domain definitions. The server produces graphical and tabular comparisons of the accuracy of a user's input sequence alignments with other commonly used programs, such as BLAST, PSI-BLAST, Clustal W, and SAM-T99. AVAILABILITY: The server is located at http://capb.dbi.udel.edu/casa.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Alinhamento de Sequência/métodos , Alinhamento de Sequência/normas , Software , Algoritmos , Calibragem , Metodologias Computacionais , Estudos de Avaliação como Assunto , Internet , National Library of Medicine (U.S.) , Análise de Sequência de Proteína/métodos , Estados Unidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA