Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Nucleic Acids Res ; 37(Web Server issue): W23-7, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19420058

RESUMO

BioMart Central Portal (www.biomart.org) offers a one-stop shop solution to access a wide array of biological databases. These include major biomolecular sequence, pathway and annotation databases such as Ensembl, Uniprot, Reactome, HGNC, Wormbase and PRIDE; for a complete list, visit, http://www.biomart.org/biomart/martview. Moreover, the web server features seamless data federation making cross querying of these data sources in a user friendly and unified way. The web server not only provides access through a web interface (MartView), it also supports programmatic access through a Perl API as well as RESTful and SOAP oriented web services. The website is free and open to all users and there is no login requirement.


Assuntos
Bases de Dados Genéticas , Software , Biologia , Internet , Interface Usuário-Computador
2.
BMC Genomics ; 10: 22, 2009 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-19144180

RESUMO

BACKGROUND: Biologists need to perform complex queries, often across a variety of databases. Typically, each data resource provides an advanced query interface, each of which must be learnt by the biologist before they can begin to query them. Frequently, more than one data source is required and for high-throughput analysis, cutting and pasting results between websites is certainly very time consuming. Therefore, many groups rely on local bioinformatics support to process queries by accessing the resource's programmatic interfaces if they exist. This is not an efficient solution in terms of cost and time. Instead, it would be better if the biologist only had to learn one generic interface. BioMart provides such a solution. RESULTS: BioMart enables scientists to perform advanced querying of biological data sources through a single web interface. The power of the system comes from integrated querying of data sources regardless of their geographical locations. Once these queries have been defined, they may be automated with its "scripting at the click of a button" functionality. BioMart's capabilities are extended by integration with several widely used software packages such as BioConductor, DAS, Galaxy, Cytoscape, Taverna. In this paper, we describe all aspects of BioMart from a user's perspective and demonstrate how it can be used to solve real biological use cases such as SNP selection for candidate gene screening or annotation of microarray results. CONCLUSION: BioMart is an easy to use, generic and scalable system and therefore, has become an integral part of large data resources including Ensembl, UniProt, HapMap, Wormbase, Gramene, Dictybase, PRIDE, MSD and Reactome. BioMart is freely accessible to use at http://www.biomart.org.


Assuntos
Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Internet , Interface Usuário-Computador
3.
Nat Commun ; 9(1): 4746, 2018 11 12.
Artigo em Inglês | MEDLINE | ID: mdl-30420699

RESUMO

Biomarkers lie at the heart of precision medicine. Surprisingly, while rapid genomic profiling is becoming ubiquitous, the development of biomarkers usually involves the application of bespoke techniques that cannot be directly applied to other datasets. There is an urgent need for a systematic methodology to create biologically-interpretable molecular models that robustly predict key phenotypes. Here we present SIMMS (Subnetwork Integration for Multi-Modal Signatures): an algorithm that fragments pathways into functional modules and uses these to predict phenotypes. We apply SIMMS to multiple data types across five diseases, and in each it reproducibly identifies known and novel subtypes, and makes superior predictions to the best bespoke approaches. To demonstrate its ability on a new dataset, we profile 33 genes/nodes of the PI3K pathway in 1734 FFPE breast tumors and create a four-subnetwork prediction model. This model out-performs a clinically-validated molecular test in an independent cohort of 1742 patients. SIMMS is generic and enables systematic data integration for robust biomarker discovery.


Assuntos
Algoritmos , Biomarcadores Tumorais/análise , Redes e Vias Metabólicas , Neoplasias/metabolismo , Benchmarking , Proliferação de Células , Humanos , Transdução de Sinais , Resultado do Tratamento
4.
PLoS Biol ; 2(6): e162, 2004 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-15103394

RESUMO

The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.


Assuntos
Biologia Computacional/métodos , DNA Complementar/genética , Bases de Dados Genéticas , Genes/fisiologia , Genoma Humano , Processamento Alternativo/genética , Genes/genética , Humanos , Internet , Repetições de Microssatélites/genética , Fases de Leitura Aberta/genética , Polimorfismo Genético , Polimorfismo de Nucleotídeo Único , Estrutura Terciária de Proteína
5.
Clin Cancer Res ; 21(6): 1477-86, 2015 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-25609067

RESUMO

PURPOSE: While the dysregulation of specific pathways in cancer influences both treatment response and outcome, few current prognostic markers explicitly consider differential pathway activation. Here we explore this concept, focusing on K-Ras mutations in lung adenocarcinoma (present in 25%-35% of patients). EXPERIMENTAL DESIGN: The effect of K-Ras mutation status on prognostic accuracy of existing signatures was evaluated in 404 patients. Genes associated with K-Ras mutation status were identified and used to create a RAS pathway activation classifier to provide a more accurate measure of RAS pathway status. Next, 8 million random signatures were evaluated to assess differences in prognosing patients with or without RAS activation. Finally, a prognostic signature was created to target patients with RAS pathway activation. RESULTS: We first show that K-Ras status influences the accuracy of existing prognostic signatures, which are effective in K-Ras-wild-type patients but fail in patients with K-Ras mutations. Next, we show that it is fundamentally more difficult to predict the outcome of patients with RAS activation (RAS(mt)) than that of those without (RAS(wt)). More importantly, we demonstrate that different signatures are prognostic in RAS(wt) and RAS(mt). Finally, to exploit this discovery, we create separate prognostic signatures for RAS(wt) and RAS(mt) patients and show that combining them significantly improves predictions of patient outcome. CONCLUSIONS: We present a nested model for integrated genomic and transcriptomic data. This model is general and is not limited to lung adenocarcinomas but can be expanded to other tumor types and oncogenes.


Assuntos
Adenocarcinoma/genética , Adenocarcinoma/mortalidade , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/mortalidade , Mutação/genética , Proteínas Proto-Oncogênicas/genética , Proteínas ras/genética , Adenocarcinoma de Pulmão , Ativação Enzimática/genética , Perfilação da Expressão Gênica , Humanos , Modelos Teóricos , Prognóstico , Proteínas Proto-Oncogênicas/metabolismo , Proteínas Proto-Oncogênicas p21(ras) , Proteínas ras/metabolismo
6.
J Biomed Semantics ; 4(1): 6, 2013 Feb 11.
Artigo em Inglês | MEDLINE | ID: mdl-23398680

RESUMO

BACKGROUND: BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research. RESULTS: The theme of BioHackathon 2010 was the 'Semantic Web', and all attendees gathered with the shared goal of producing Semantic Web data from their respective resources, and/or consuming or interacting those data using their tools and interfaces. We discussed on topics including guidelines for designing semantic data and interoperability of resources. We consequently developed tools and clients for analysis and visualization. CONCLUSION: We provide a meeting report from BioHackathon 2010, in which we describe the discussions, decisions, and breakthroughs made as we moved towards compliance with Semantic Web technologies - from source provider, through middleware, to the end-consumer.

7.
Radiother Oncol ; 102(3): 436-43, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22356756

RESUMO

BACKGROUND AND PURPOSE: Recent data suggest that in vitro and in vivo derived hypoxia gene-expression signatures have prognostic power in breast and possibly other cancers. However, both tumour hypoxia and the biological adaptation to this stress are highly dynamic. Assessment of time-dependent gene-expression changes in response to hypoxia may thus provide additional biological insights and assist in predicting the impact of hypoxia on patient prognosis. MATERIALS AND METHODS: Transcriptome profiling was performed for three cell lines derived from diverse tumour-types after hypoxic exposure at eight time-points, which include a normoxic time-point. Time-dependent sets of co-regulated genes were identified from these data. Subsequently, gene ontology (GO) and pathway analyses were performed. The prognostic power of these novel signatures was assessed in parallel with previous in vitro and in vivo derived hypoxia signatures in a large breast cancer microarray meta-dataset (n=2312). RESULTS: We identified seven recurrent temporal and two general hypoxia signatures. GO and pathway analyses revealed regulation of both common and unique underlying biological processes within these signatures. None of the new or previously published in vitro signatures consisting of hypoxia-induced genes were prognostic in the large breast cancer dataset. In contrast, signatures of repressed genes, as well as the in vivo derived signatures of hypoxia-induced genes showed clear prognostic power. CONCLUSIONS: Only a subset of hypoxia-induced genes in vitro demonstrates prognostic value when evaluated in a large clinical dataset. Despite clear evidence of temporal patterns of gene-expression in vitro, the subset of prognostic hypoxia regulated genes cannot be identified based on temporal pattern alone. In vivo derived signatures appear to identify the prognostic hypoxia induced genes. The prognostic value of hypoxia-repressed genes is likely a surrogate for the known importance of proliferation in breast cancer outcome.


Assuntos
Neoplasias da Mama/metabolismo , Neoplasias da Mama/mortalidade , Perfilação da Expressão Gênica , Hipóxia/metabolismo , Neoplasias da Mama/patologia , Linhagem Celular Tumoral , Feminino , Humanos , Análise de Componente Principal , Prognóstico
8.
Database (Oxford) ; 2011: bar038, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21930506

RESUMO

BioMart is a freely available, open source, federated database system that provides a unified access to disparate, geographically distributed data sources. It is designed to be data agnostic and platform independent, such that existing databases can easily be incorporated into the BioMart framework. BioMart allows databases hosted on different servers to be presented seamlessly to users, facilitating collaborative projects between different research groups. BioMart contains several levels of query optimization to efficiently manage large data sets and offers a diverse selection of graphical user interfaces and application programming interfaces to ensure that queries can be performed in whatever manner is most convenient for the user. The software has now been adopted by a large number of different biological databases spanning a wide range of data types and providing a rich source of annotation available to bioinformaticians and biologists alike.


Assuntos
Biologia Computacional , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Internet , Interface Usuário-Computador , Comportamento Cooperativo , Cooperação Internacional
9.
Database (Oxford) ; 2011: bar026, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21930502

RESUMO

The International Cancer Genome Consortium (ICGC) is a collaborative effort to characterize genomic abnormalities in 50 different cancer types. To make this data available, the ICGC has created the ICGC Data Portal. Powered by the BioMart software, the Data Portal allows each ICGC member institution to manage and maintain its own databases locally, while seamlessly presenting all the data in a single access point for users. The Data Portal currently contains data from 24 cancer projects, including ICGC, The Cancer Genome Atlas (TCGA), Johns Hopkins University, and the Tumor Sequencing Project. It consists of 3478 genomes and 13 cancer types and subtypes. Available open access data types include simple somatic mutations, copy number alterations, structural rearrangements, gene expression, microRNAs, DNA methylation and exon junctions. Additionally, simple germline variations are available as controlled access data. The Data Portal uses a web-based graphical user interface (GUI) to offer researchers multiple ways to quickly and easily search and analyze the available data. The web interface can assist in constructing complicated queries across multiple data sets. Several application programming interfaces are also available for programmatic access. Here we describe the organization, functionality, and capabilities of the ICGC Data Portal.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Genômica , Neoplasias/genética , Perfilação da Expressão Gênica , Variação Genética , Humanos , Cooperação Internacional , Internet , Sociedades , Interface Usuário-Computador
10.
J Biomed Semantics ; 2: 4, 2011 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-21806842

RESUMO

BACKGROUND: The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009. RESULTS: Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs. CONCLUSIONS: Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework.

12.
Bioinformatics ; 21(16): 3439-40, 2005 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-16082012

RESUMO

biomaRt is a new Bioconductor package that integrates BioMart data resources with data analysis software in Bioconductor. It can annotate a wide range of gene or gene product identifiers (e.g. Entrez-Gene and Affymetrix probe identifiers) with information such as gene symbol, chromosomal coordinates, Gene Ontology and OMIM annotation. Furthermore biomaRt enables retrieval of genomic sequences and single nucleotide polymorphism information, which can be used in data analysis. Fast and up-to-date data retrieval is possible as the package executes direct SQL queries to the BioMart databases (e.g. Ensembl). The biomaRt package provides a tight integration of large, public or locally installed BioMart databases with data analysis in Bioconductor creating a powerful environment for biological data mining.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Perfilação da Expressão Gênica/métodos , Armazenamento e Recuperação da Informação/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Software , Algoritmos , Integração de Sistemas
13.
Genome Res ; 14(1): 160-9, 2004 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-14707178

RESUMO

The EnsMart system (www.ensembl.org/EnsMart) provides a generic data warehousing solution for fast and flexible querying of large biological data sets and integration with third-party data and tools. The system consists of a query-optimized database and interactive, user-friendly interfaces. EnsMart has been applied to Ensembl, where it extends its genomic browser capabilities, facilitating rapid retrieval of customized data sets. A wide variety of complex queries, on various types of annotations, for numerous species are supported. These can be applied to many research problems, ranging from SNP selection for candidate gene screening, through cross-species evolutionary comparisons, to microarray annotation. Users can group and refine biological data according to many criteria, including cross-species analyses, disease links, sequence variations, and expression patterns. Both tabulated list data and biological sequence output can be generated dynamically, in HTML, text, Microsoft Excel, and compressed formats. A wide range of sequence types, such as cDNA, peptides, coding regions, UTRs, and exons, with additional upstream and downstream regions, can be retrieved. The EnsMart database can be accessed via a public Web site, or through a Java application suite. Both implementations and the database are freely available for local installation, and can be extended or adapted to 'non-Ensembl' data sets.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Animais , Sequência de Bases , Biologia Computacional/métodos , Gráficos por Computador , Genes/genética , Humanos , Camundongos , Dados de Sequência Molecular , Ratos , Software , Interface Usuário-Computador
14.
Blood ; 99(12): 4638-41, 2002 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-12036901

RESUMO

The 5q- syndrome is the most distinct of the myelodysplastic syndromes, and the molecular basis for this disorder remains unknown. We describe the narrowing of the common deleted region (CDR) of the 5q- syndrome to the approximately 1.5-megabases interval at 5q32 flanked by D5S413 and the GLRA1 gene. The Ensembl gene prediction program has been used for the complete genomic annotation of the CDR. The CDR is gene rich and contains 24 known genes and 16 novel (predicted) genes. Of 40 genes in the CDR, 33 are expressed in CD34(+) cells and, therefore, represent candidate genes since they are expressed within the hematopoietic stem/progenitor cell compartment. A number of the genes assigned to the CDR represent good candidates for the 5q- syndrome, including MEGF1, G3BP, and several of the novel gene predictions. These data now afford a comprehensive mutational/expression analysis of all candidate genes assigned to the CDR.


Assuntos
Mapeamento Cromossômico , Cromossomos Humanos Par 5 , Deleção de Genes , Síndromes Mielodisplásicas/genética , Antígenos CD34 , Mapeamento Cromossômico/métodos , Humanos , Células-Tronco/imunologia , Células-Tronco/metabolismo
15.
Genome Res ; 14(5): 925-8, 2004 May.
Artigo em Inglês | MEDLINE | ID: mdl-15078858

RESUMO

Ensembl (http://www.ensembl.org/) is a bioinformatics project to organize biological information around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of individual genomes, and of the synteny and orthology relationships between them. It is also a framework for integration of any biological data that can be mapped onto features derived from the genomic sequence. Ensembl is available as an interactive Web site, a set of flat files, and as a complete, portable open source software system for handling genomes. All data are provided without restriction, and code is freely available. Ensembl's aims are to continue to "widen" this biological integration to include other model organisms relevant to understanding human biology as they become available; to "deepen" this integration to provide an ever more seamless linkage between equivalent components in different species; and to provide further classification of functional elements in the genome that have been previously elusive.


Assuntos
Biologia Computacional/tendências
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA