Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
BMC Bioinformatics ; 17: 43, 2016 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-26792120

RESUMO

BACKGROUND: Here we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. RESULTS: In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resulting functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. CONCLUSIONS: PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequence-based genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.


Assuntos
Genoma Bacteriano , Herbaspirillum/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Internet , Anotação de Sequência Molecular/métodos , Software , Biologia Computacional/métodos , Computadores , Microbiologia da Água
2.
J Autoimmun ; 50: 77-82, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24387802

RESUMO

Previous cross-sectional analyses demonstrated that CD8(+) and CD4(+) T-cell reactivity to islet-specific antigens was more prevalent in T1D subjects than in healthy donors (HD). Here, we examined T1D-associated epitope-specific CD4(+) T-cell cytokine production and autoreactive CD8(+) T-cell frequency on a monthly basis for one year in 10 HD, 33 subjects with T1D, and 15 subjects with T2D. Autoreactive CD4(+) T-cells from both T1D and T2D subjects produced more IFN-γ when stimulated than cells from HD. In contrast, higher frequencies of islet antigen-specific CD8(+) T-cells were detected only in T1D. These observations support the hypothesis that general beta-cell stress drives autoreactive CD4(+) T-cell activity while islet over-expression of MHC class I commonly seen in T1D mediates amplification of CD8(+) T-cells and more rapid beta-cell loss. In conclusion, CD4(+) T-cell autoreactivity appears to be present in both T1D and T2D while autoreactive CD8(+) T-cells are unique to T1D. Thus, autoreactive CD8(+) cells may serve as a more T1D-specific biomarker.


Assuntos
Autoantígenos/imunologia , Linfócitos T CD4-Positivos/imunologia , Linfócitos T CD8-Positivos/imunologia , Diabetes Mellitus Tipo 1/imunologia , Diabetes Mellitus Tipo 2/imunologia , Ilhotas Pancreáticas/imunologia , Adulto , Idoso , Linfócitos T CD4-Positivos/patologia , Linfócitos T CD8-Positivos/patologia , Estudos de Casos e Controles , Citotoxicidade Imunológica , Diabetes Mellitus Tipo 1/patologia , Diabetes Mellitus Tipo 2/patologia , ELISPOT , Feminino , Humanos , Interferon gama/biossíntese , Ilhotas Pancreáticas/patologia , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade
3.
BMC Bioinformatics ; 13: 321, 2012 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-23198735

RESUMO

BACKGROUND: Methods of weakening and attenuating pathogens' abilities to infect and propagate in a host, thus allowing the natural immune system to more easily decimate invaders, have gained attention as alternatives to broad-spectrum targeting approaches. The following work describes a technique to identifying proteins involved in virulence by relying on latent information computationally gathered across biological repositories, applicable to both generic and specific virulence categories. RESULTS: A lightweight method for data integration is used, which links information regarding a protein via a path-based query graph. A method of weighting is then applied to query graphs that can serve as input to various statistical classification methods for discrimination, and the combined usage of both data integration and learning methods are tested against the problem of both generalized and specific virulence function prediction. CONCLUSIONS: This approach improves coverage of functional data over a protein. Moreover, while depending largely on noisy and potentially non-curated data from public sources, we find it outperforms other techniques to identification of general virulence factors and baseline remote homology detection methods for specific virulence categories.


Assuntos
Proteínas/classificação , Análise de Sequência de Proteína/métodos , Análise de Sequência de Proteína/estatística & dados numéricos , Fatores de Virulência/classificação , Interpretação Estatística de Dados , Bases de Dados de Proteínas , Proteínas/química , Virulência , Fatores de Virulência/química
4.
BMC Res Notes ; 5: 96, 2012 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-22333139

RESUMO

BACKGROUND: Genes conferring antibiotic resistance to groups of bacterial pathogens are cause for considerable concern, as many once-reliable antibiotics continue to see a reduction in efficacy. The recent discovery of the metallo ß-lactamase blaNDM-1 gene, which appears to grant antibiotic resistance to a variety of Enterobacteriaceae via a mobile plasmid, is one example of this distressing trend. The following work describes a computational analysis of pathogen-borne MBLs that focuses on the structural aspects of characterized proteins. RESULTS: Using both sequence and structural analyses, we examine residues and structural features specific to various pathogen-borne MBL types. This analysis identifies a linker region within MBL-like folds that may act as a discriminating structural feature between these proteins, and specifically resistance-associated acquirable MBLs. Recently released crystal structures of the newly emerged NDM-1 protein were aligned against related MBL structures using a variety of global and local structural alignment methods, and the overall fold conformation is examined for structural conservation. Conservation appears to be present in most areas of the protein, yet is strikingly absent within a linker region, making NDM-1 unique with respect to a linker-based classification scheme. Variability analysis of the NDM-1 crystal structure highlights unique residues in key regions as well as identifying several characteristics shared with other transferable MBLs. CONCLUSIONS: A discriminating linker region identified in MBL proteins is highlighted and examined in the context of NDM-1 and primarily three other MBL types: IMP-1, VIM-2 and ccrA. The presence of an unusual linker region variant and uncommon amino acid composition at specific structurally important sites may help to explain the unusually broad kinetic profile of NDM-1 and may aid in directing research attention to areas of this protein, and possibly other MBLs, that may be targeted for inactivation or attenuation of enzymatic activity.

5.
J Biomed Semantics ; 2 Suppl 3: S2, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21992591

RESUMO

BACKGROUND: Extracting medication information from clinical records has many potential applications, and recently published research, systems, and competitions reflect an interest therein. Much of the early extraction work involved rules and lexicons, but more recently machine learning has been applied to the task. METHODS: We present a hybrid system consisting of two parts. The first part, field detection, uses a cascade of statistical classifiers to identify medication-related named entities. The second part uses simple heuristics to link those entities into medication events. RESULTS: The system achieved performance that is comparable to other approaches to the same task. This performance is further improved by adding features that reference external medication name lists. CONCLUSIONS: This study demonstrates that our hybrid approach outperforms purely statistical or rule-based systems. The study also shows that a cascade of classifiers works better than a single classifier in extracting medication information. The system is available as is upon request from the first author.

6.
J Am Med Inform Assoc ; 17(5): 514-8, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20819854

RESUMO

The Third i2b2 Workshop on Natural Language Processing Challenges for Clinical Records focused on the identification of medications, their dosages, modes (routes) of administration, frequencies, durations, and reasons for administration in discharge summaries. This challenge is referred to as the medication challenge. For the medication challenge, i2b2 released detailed annotation guidelines along with a set of annotated discharge summaries. Twenty teams representing 23 organizations and nine countries participated in the medication challenge. The teams produced rule-based, machine learning, and hybrid systems targeted to the task. Although rule-based systems dominated the top 10, the best performing system was a hybrid. Of all medication-related fields, durations and reasons were the most difficult for all systems to detect. While medications themselves were identified with better than 0.75 F-measure by all of the top 10 systems, the best F-measure for durations and reasons were 0.525 and 0.459, respectively. State-of-the-art natural language processing systems go a long way toward extracting medication names, dosages, modes, and frequencies. However, they are limited in recognizing duration and reason fields and would benefit from future research.


Assuntos
Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Preparações Farmacêuticas , Computadores Híbridos , Humanos , Pacientes Desistentes do Tratamento
7.
J Am Med Inform Assoc ; 17(5): 519-23, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20819855

RESUMO

OBJECTIVE: Within the context of the Third i2b2 Workshop on Natural Language Processing Challenges for Clinical Records, the authors (also referred to as 'the i2b2 medication challenge team' or 'the i2b2 team' for short) organized a community annotation experiment. DESIGN: For this experiment, the authors released annotation guidelines and a small set of annotated discharge summaries. They asked the participants of the Third i2b2 Workshop to annotate 10 discharge summaries per person; each discharge summary was annotated by two annotators from two different teams, and a third annotator from a third team resolved disagreements. MEASUREMENTS: In order to evaluate the reliability of the annotations thus produced, the authors measured community inter-annotator agreement and compared it with the inter-annotator agreement of expert annotators when both the community and the expert annotators generated ground truth based on pooled system outputs. For this purpose, the pool consisted of the three most densely populated automatic annotations of each record. The authors also compared the community inter-annotator agreement with expert inter-annotator agreement when the experts annotated raw records without using the pool. Finally, they measured the quality of the community ground truth by comparing it with the expert ground truth. RESULTS AND CONCLUSIONS: The authors found that the community annotators achieved comparable inter-annotator agreement to expert annotators, regardless of whether the experts annotated from the pool. Furthermore, the ground truth generated by the community obtained F-measures above 0.90 against the ground truth of the experts, indicating the value of the community as a source of high-quality ground truth even on intricate and domain-specific annotation tasks.


Assuntos
Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Preparações Farmacêuticas , Humanos , Alta do Paciente
8.
J Biomed Inform ; 43(6): 873-82, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-20643225

RESUMO

Though there have been many advances in providing access to linked and integrated biomedical data across repositories, developing methods which allow users to specify ambiguous and exploratory queries over disparate sources remains a challenge to extracting well-curated or diversely-supported biological information. In the following work, we discuss the concepts of data coverage and evidence in the context of integrated sources. We address diverse information retrieval via a simple framework for representing coverage and evidence that operates in parallel with an arbitrary schema, and a language upon which queries on the schema and framework may be executed. We show that this approach is capable of answering questions that require ranged levels of evidence or triangulation, and demonstrate that appropriately-formed queries can significantly improve the level of precision when retrieving well-supported biomedical data.


Assuntos
Bases de Dados Factuais , Armazenamento e Recuperação da Informação/métodos , Pesquisa Biomédica , Internet , Semântica
9.
J Biomed Inform ; 43(3): 407-18, 2010 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-20015478

RESUMO

Genome wide association studies (GWAS) are an important approach to understanding the genetic mechanisms behind human diseases. Single nucleotide polymorphisms (SNPs) are the predominant markers used in genome wide association studies, and the ability to predict which SNPs are likely to be functional is important for both a priori and a posteriori analyses of GWA studies. This article describes the design, implementation and evaluation of a family of systems for the purpose of identifying SNPs that may cause a change in phenotypic outcomes. The methods described in this article characterize the feasibility of combinations of logical and probabilistic inference with federated data integration for both point and regional SNP annotation and analysis. Evaluations of the methods demonstrate the overall strong predictive value of logical, and logical with probabilistic, inference applied to the domain of SNP annotation.


Assuntos
Modelos Estatísticos , Polimorfismo de Nucleotídeo Único , Bases de Dados Genéticas , Estudo de Associação Genômica Ampla/métodos , Lógica
10.
AMIA Annu Symp Proc ; : 889, 2008 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-18999001

RESUMO

In the following work, we test a generalized approach to integrating, transforming and learning data from disparate data sources for the classification of bacterial proteins involved in pathogenesis. We rely on the implicit inter-linkages between biological databases to draw relevant records, and leverage statistical learning methods to infer classification based on abundant, albeit noisy, data. Results suggest that types of public biological information have varying degrees of effectiveness in predictive data mining.


Assuntos
Inteligência Artificial , Proteínas de Bactérias/classificação , Toxinas Bacterianas/classificação , Bases de Dados de Proteínas , Reconhecimento Automatizado de Padrão/métodos , Terminologia como Assunto , Fatores de Virulência/classificação , Algoritmos , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural
11.
Pac Symp Biocomput ; : 343-54, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17990504

RESUMO

Scientists working on genomics projects are often faced with the difficult task of sifting through large amounts of biological information dispersed across various online data sources that are relevant to their area or organism of research. Gene annotation, the process of identifying the functional role of a possible gene, in particular has become increasingly more time-consuming and laborious to conduct as more genomes are sequenced and the number of candidate genes continues to increase at near-exponential pace; genes are left un-annotated, or worse, incorrectly annotated. Many groups have attempted to address the annotation backlog through automated annotation systems that are geared toward specific organisms, and which may thus not possess the necessary flexibility and scalability to annotate other genomes. In this paper, we present a method and framework which attempts to address problems inherent in manual and automatic annotation by coupling a data integration system, BioMediator, to an inference engine with the aim of elucidating functional annotations. The framework and heuristics developed are not specific to any particular genome. We validated the method with a set of randomly-selected annotated sequences from a variety of organisms. Preliminary results show that the hybrid data integration and inference approach generates functional annotations that are as good as or better than "gold standard" annotations approximately 80% of the time.


Assuntos
Biologia Computacional , Bases de Dados Genéticas , Genômica/estatística & dados numéricos , Sistemas Computacionais , Interpretação Estatística de Dados , Sistemas Inteligentes , Software
12.
Science ; 309(5733): 404-9, 2005 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-16020724

RESUMO

A comparison of gene content and genome architecture of Trypanosoma brucei, Trypanosoma cruzi, and Leishmania major, three related pathogens with different life cycles and disease pathology, revealed a conserved core proteome of about 6200 genes in large syntenic polycistronic gene clusters. Many species-specific genes, especially large surface antigen families, occur at nonsyntenic chromosome-internal and subtelomeric regions. Retroelements, structural RNAs, and gene family expansion are often associated with syntenic discontinuities that-along with gene divergence, acquisition and loss, and rearrangement within the syntenic regions-have shaped the genomes of each parasite. Contrary to recent reports, our analyses reveal no evidence that these species are descended from an ancestor that contained a photosynthetic endosymbiont.


Assuntos
Genoma de Protozoário , Leishmania major/genética , Proteoma , Proteínas de Protozoários/genética , Trypanosoma brucei brucei/genética , Trypanosoma cruzi/genética , Animais , Evolução Biológica , Cromossomos/genética , Evolução Molecular , Transferência Genética Horizontal , Genes de Protozoários , Genômica , Leishmania major/química , Leishmania major/metabolismo , Dados de Sequência Molecular , Família Multigênica , Mutação , Filogenia , Plastídeos/genética , Proteínas de Protozoários/química , Proteínas de Protozoários/fisiologia , Recombinação Genética , Retroelementos , Especificidade da Espécie , Simbiose , Sintenia , Telômero/genética , Trypanosoma brucei brucei/química , Trypanosoma brucei brucei/metabolismo , Trypanosoma cruzi/química , Trypanosoma cruzi/metabolismo
13.
Science ; 309(5733): 409-15, 2005 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-16020725

RESUMO

Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid genome contains a predicted 22,570 proteins encoded by genes, of which 12,570 represent allelic pairs. Over 50% of the genome consists of repeated sequences, such as retrotransposons and genes for large families of surface molecules, which include trans-sialidases, mucins, gp63s, and a large novel family (>1300 copies) of mucin-associated surface protein (MASP) genes. Analyses of the T. cruzi, T. brucei, and Leishmania major (Tritryp) genomes imply differences from other eukaryotes in DNA repair and initiation of replication and reflect their unusual mitochondrial DNA. Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.


Assuntos
Genoma de Protozoário , Proteínas de Protozoários/genética , Análise de Sequência de DNA , Trypanosoma cruzi/genética , Animais , Doença de Chagas/tratamento farmacológico , Doença de Chagas/parasitologia , Reparo do DNA , Replicação do DNA , DNA Mitocondrial/genética , DNA de Protozoário/genética , Genes de Protozoários , Humanos , Meiose , Proteínas de Membrana/química , Proteínas de Membrana/genética , Proteínas de Membrana/fisiologia , Família Multigênica , Proteínas de Protozoários/química , Proteínas de Protozoários/fisiologia , Recombinação Genética , Sequências Repetitivas de Ácido Nucleico , Retroelementos , Transdução de Sinais , Telômero/genética , Tripanossomicidas/farmacologia , Tripanossomicidas/uso terapêutico , Trypanosoma cruzi/química , Trypanosoma cruzi/fisiologia
14.
Science ; 309(5733): 436-42, 2005 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-16020728

RESUMO

Leishmania species cause a spectrum of human diseases in tropical and subtropical regions of the world. We have sequenced the 36 chromosomes of the 32.8-megabase haploid genome of Leishmania major (Friedlin strain) and predict 911 RNA genes, 39 pseudogenes, and 8272 protein-coding genes, of which 36% can be ascribed a putative function. These include genes involved in host-pathogen interactions, such as proteolytic enzymes, and extensive machinery for synthesis of complex surface glycoconjugates. The organization of protein-coding genes into long, strand-specific, polycistronic clusters and lack of general transcription factors in the L. major, Trypanosoma brucei, and Trypanosoma cruzi (Tritryp) genomes suggest that the mechanisms regulating RNA polymerase II-directed transcription are distinct from those operating in other eukaryotes, although the trypanosomatids appear capable of chromatin remodeling. Abundant RNA-binding proteins are encoded in the Tritryp genomes, consistent with active posttranscriptional regulation of gene expression.


Assuntos
Genoma de Protozoário , Leishmania major/genética , Análise de Sequência de DNA , Animais , Cromatina/genética , Cromatina/metabolismo , Regulação da Expressão Gênica , Genes de Protozoários , Genes de RNAr , Glicoconjugados/biossíntese , Glicoconjugados/metabolismo , Leishmania major/química , Leishmania major/metabolismo , Leishmaniose Cutânea/parasitologia , Metabolismo dos Lipídeos , Proteínas de Membrana/biossíntese , Proteínas de Membrana/química , Proteínas de Membrana/genética , Proteínas de Membrana/metabolismo , Dados de Sequência Molecular , Família Multigênica , Biossíntese de Proteínas , Processamento de Proteína Pós-Traducional , Proteínas de Protozoários/biossíntese , Proteínas de Protozoários/química , Proteínas de Protozoários/genética , Proteínas de Protozoários/metabolismo , Processamento Pós-Transcricional do RNA , Splicing de RNA , RNA de Protozoário/genética , RNA de Protozoário/metabolismo , Transcrição Gênica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA