Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Nucleic Acids Res ; 45(D1): D777-D783, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899578

RESUMEN

COSMIC, the Catalogue of Somatic Mutations in Cancer (http://cancer.sanger.ac.uk) is a high-resolution resource for exploring targets and trends in the genetics of human cancer. Currently the broadest database of mutations in cancer, the information in COSMIC is curated by expert scientists, primarily by scrutinizing large numbers of scientific publications. Over 4 million coding mutations are described in v78 (September 2016), combining genome-wide sequencing results from 28 366 tumours with complete manual curation of 23 489 individual publications focused on 186 key genes and 286 key fusion pairs across all cancers. Molecular profiling of large tumour numbers has also allowed the annotation of more than 13 million non-coding mutations, 18 029 gene fusions, 187 429 genome rearrangements, 1 271 436 abnormal copy number segments, 9 175 462 abnormal expression variants and 7 879 142 differentially methylated CpG dinucleotides. COSMIC now details the genetics of drug resistance, novel somatic gene mutations which allow a tumour to evade therapeutic cancer drugs. Focusing initially on highly characterized drugs and genes, COSMIC v78 contains wide resistance mutation profiles across 20 drugs, detailing the recurrence of 301 unique resistance alleles across 1934 drug-resistant tumours. All information from the COSMIC database is available freely on the COSMIC website.


Asunto(s)
Bases de Datos Genéticas , Mutación , Neoplasias/genética , Biología Computacional/métodos , Resistencia a Antineoplásicos/genética , Genoma Humano , Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Humanos , Navegador Web
2.
Artículo en Inglés | MEDLINE | ID: mdl-27589961

RESUMEN

Fully automated text mining (TM) systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These systems are not meant to replace biocurators, but instead to assist them in one or more literature curation steps. To do so, the user interface is an important aspect that needs to be considered for tool adoption. The BioCreative Interactive task (IAT) is a track designed for exploring user-system interactions, promoting development of useful TM tools, and providing a communication channel between the biocuration and the TM communities. In BioCreative V, the IAT track followed a format similar to previous interactive tracks, where the utility and usability of TM tools, as well as the generation of use cases, have been the focal points. The proposed curation tasks are user-centric and formally evaluated by biocurators. In BioCreative V IAT, seven TM systems and 43 biocurators participated. Two levels of user participation were offered to broaden curator involvement and obtain more feedback on usability aspects. The full level participation involved training on the system, curation of a set of documents with and without TM assistance, tracking of time-on-task, and completion of a user survey. The partial level participation was designed to focus on usability aspects of the interface and not the performance per se In this case, biocurators navigated the system by performing pre-designed tasks and then were asked whether they were able to achieve the task and the level of difficulty in completing the task. In this manuscript, we describe the development of the interactive task, from planning to execution and discuss major findings for the systems tested.Database URL: http://www.biocreative.org.


Asunto(s)
Curaduría de Datos/métodos , Minería de Datos/métodos , Procesamiento Automatizado de Datos/métodos
3.
Database (Oxford) ; 2014(0): bau033, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24715220

RESUMEN

The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we present the 'tagtog' system, a web-based annotation framework that can be used to mark up biological entities (such as genes) and concepts (such as Gene Ontology terms) in full-text articles. tagtog leverages manual user annotation in combination with automatic machine-learned annotation to provide accurate identification of gene symbols and gene names. As part of the BioCreative IV Interactive Annotation Task, FlyBase has used tagtog to identify and extract mentions of Drosophila melanogaster gene symbols and names in full-text biomedical articles from the PLOS stable of journals. We show here the results of three experiments with different sized corpora and assess gene recognition performance and curation speed. We conclude that tagtog-named entity recognition improves with a larger corpus and that tagtog-assisted curation is quicker than manual curation. DATABASE URL: www.tagtog.net, www.flybase.org.


Asunto(s)
Minería de Datos/métodos , Anotación de Secuencia Molecular/métodos , Animales , Drosophila/genética , Internet , Programas Informáticos , Interfaz Usuario-Computador , Vocabulario Controlado
4.
Nucleic Acids Res ; 42(Database issue): D780-8, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24234449

RESUMEN

FlyBase (http://flybase.org) is the leading website and database of Drosophila genes and genomes. Whether you are using the fruit fly Drosophila melanogaster as an experimental system or wish to understand Drosophila biological knowledge in relation to human disease or to other model systems, FlyBase can help you successfully find the information you are looking for. Here, we demonstrate some of our more advanced searching systems and highlight some of our new tools for searching the wealth of data on FlyBase. The first section explores gene function in FlyBase, using our TermLink tool to search with Controlled Vocabulary terms and our new RNA-Seq Search tool to search gene expression. The second section of this article describes a few ways to search genomic data in FlyBase, using our BLAST server and the new implementation of GBrowse 2, as well as our new FeatureMapper tool. Finally, we move on to discuss our most powerful search tool, QueryBuilder, before describing pre-computed cuts of the data and how to query the database programmatically.


Asunto(s)
Bases de Datos Genéticas , Drosophila/genética , Genoma de los Insectos , Animales , Drosophila melanogaster/genética , Perfilación de la Expresión Génica , Ontología de Genes , Genes de Insecto , Internet , Fenotipo , Análisis de Secuencia de ARN
5.
J Biomed Semantics ; 4(1): 30, 2013 Oct 18.
Artículo en Inglés | MEDLINE | ID: mdl-24138933

RESUMEN

BACKGROUND: Phenotype ontologies are queryable classifications of phenotypes. They provide a widely-used means for annotating phenotypes in a form that is human-readable, programatically accessible and that can be used to group annotations in biologically meaningful ways. Accurate manual annotation requires clear textual definitions for terms. Accurate grouping and fruitful programatic usage require high-quality formal definitions that can be used to automate classification. The Drosophila phenotype ontology (DPO) has been used to annotate over 159,000 phenotypes in FlyBase to date, but until recently lacked textual or formal definitions. RESULTS: We have composed textual definitions for all DPO terms and formal definitions for 77% of them. Formal definitions reference terms from a range of widely-used ontologies including the Phenotype and Trait Ontology (PATO), the Gene Ontology (GO) and the Cell Ontology (CL). We also describe a generally applicable system, devised for the DPO, for recording and reasoning about the timing of death in populations. As a result of the new formalisations, 85% of classifications in the DPO are now inferred rather than asserted, with much of this classification leveraging the structure of the GO. This work has significantly improved the accuracy and completeness of classification and made further development of the DPO more sustainable. CONCLUSIONS: The DPO provides a set of well-defined terms for annotating Drosophila phenotypes and for grouping and querying the resulting annotation sets in biologically meaningful ways. Such queries have already resulted in successful function predictions from phenotype annotation. Moreover, such formalisations make extended queries possible, including cross-species queries via the external ontologies used in formal definitions. The DPO is openly available under an open source license in both OBO and OWL formats. There is good potential for it to be used more broadly by the Drosophila community, which may ultimately result in its extension to cover a broader range of phenotypes.

6.
BMC Bioinformatics ; 12: 175, 2011 May 19.
Artículo en Inglés | MEDLINE | ID: mdl-21595960

RESUMEN

BACKGROUND: Journal articles and databases are two major modes of communication in the biological sciences, and thus integrating these critical resources is of urgent importance to increase the pace of discovery. Projects focused on bridging the gap between journals and databases have been on the rise over the last five years and have resulted in the development of automated tools that can recognize entities within a document and link those entities to a relevant database. Unfortunately, automated tools cannot resolve ambiguities that arise from one term being used to signify entities that are quite distinct from one another. Instead, resolving these ambiguities requires some manual oversight. Finding the right balance between the speed and portability of automation and the accuracy and flexibility of manual effort is a crucial goal to making text markup a successful venture. RESULTS: We have established a journal article mark-up pipeline that links GENETICS journal articles and the model organism database (MOD) WormBase. This pipeline uses a lexicon built with entities from the database as a first step. The entity markup pipeline results in links from over nine classes of objects including genes, proteins, alleles, phenotypes and anatomical terms. New entities and ambiguities are discovered and resolved by a database curator through a manual quality control (QC) step, along with help from authors via a web form that is provided to them by the journal. New entities discovered through this pipeline are immediately sent to an appropriate curator at the database. Ambiguous entities that do not automatically resolve to one link are resolved by hand ensuring an accurate link. This pipeline has been extended to other databases, namely Saccharomyces Genome Database (SGD) and FlyBase, and has been implemented in marking up a paper with links to multiple databases. CONCLUSIONS: Our semi-automated pipeline hyperlinks articles published in GENETICS to model organism databases such as WormBase. Our pipeline results in interactive articles that are data rich with high accuracy. The use of a manual quality control step sets this pipeline apart from other hyperlinking tools and results in benefits to authors, journals, readers and databases.


Asunto(s)
Bases de Datos Factuales , Publicaciones Periódicas como Asunto , Animales , Biología/métodos , Biología/tendencias , Caenorhabditis elegans/genética , Bases de Datos Genéticas , Internet , Control de Calidad
7.
Virus Res ; 153(2): 226-33, 2010 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-20709116

RESUMEN

The complete genome sequence of an adenovirus, isolated from turkey and proposed to be turkey adenovirus type 1 (TAdV-1), was determined to extend our knowledge about the genome organisation and phylogeny of aviadenoviruses. The longest adenovirus genome, consisting of 45,412 bp, with the highest G+C content (of 67.55%) known to date, was found. The central part of the TAdV-1 genome has the conserved gene set and arrangement that are characteristic for every other adenovirus analysed to date. This genome core is flanked by the terminal early regions 1 and 4 (E1 and E4). Aviadenovirus-specific genus-common genes were found in these regions, each containing nine such open reading frames (ORFs). Additionally a type-specific novel ORF, designated as ORF50, was found in E4. Phylogenetic analysis as well as the presence of the genus-specific genes, splice sites and protease cleavage sites confirmed the classification of TAdV-1 in the genus Aviadenovirus. Intrageneric analyses of two genus-specific genes demonstrated the distinctness of TAdV-1 from other aviadenoviruses, thus supporting the proposal for the establishment of a new species, Turkey adenovirus B for TAdV-1.


Asunto(s)
Aviadenovirus/genética , Aviadenovirus/aislamiento & purificación , ADN Viral/química , ADN Viral/genética , Genoma Viral , Pavos/virología , Animales , Composición de Base , Análisis por Conglomerados , Orden Génico , Genes Virales , Datos de Secuencia Molecular , Sistemas de Lectura Abierta , Filogenia , Análisis de Secuencia de ADN
8.
DNA Seq ; 14(4): 233-9, 2003 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-14631647

RESUMEN

Transcription factors of the SMAD family relay signals from cell surface receptors to the nucleus in response to TGF-beta related soluble factors. Members of the nuclear factor I/CAAT box binding family (NFI/CTF) have been implicated as regulators of diverse biological processes such as adenovirus replication and transcription of TGF-responsive genes. There are highly conserved DNA binding domains in SMAD and NFI/CTF transcription factors that allow sequence specific DNA binding for members of each family. However, no homology relationship has been established for the DNA binding domains present in these families. For a better understanding of the structure and evolution of SMAD genes, we carried out a sensitive PSI-BLAST database search. This revealed significant similarities between the DNA binding domains of SMADs and NFI/CTF transcription factors. Enhanced graphic matrix analysis and multiple sequence alignment of the amino acid sequences of the SMAD and NFI/CTF DNA binding domains also show that these two classes of domains share considerable structural similarity. These results strongly suggest that these two classes of factors share a homologous DNA binding domain presumably resulting from a common ancestry. In contrast, the C-terminal transcription modulation domains of both SMAD and NFI/CTF families do not show any sequence similarity. Based on the structural relationship of their DNA binding domains, we propose that the SMAD and NFI/CTF transcription factors belong to new superfamily of genes.


Asunto(s)
Proteínas Potenciadoras de Unión a CCAAT/genética , Proteínas de Unión al ADN/genética , Familia de Multigenes/genética , Transducción de Señal/genética , Transactivadores/genética , Factores de Transcripción/genética , Secuencia de Aminoácidos , Animales , Bases de Datos Genéticas , Humanos , Datos de Secuencia Molecular , Factores de Transcripción NFI , Alineación de Secuencia , Proteínas Smad
9.
Development ; 130(23): 5705-16, 2003 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-14534137

RESUMEN

Genetic evidence suggests that the Drosophila ectoderm is patterned by a spatial gradient of bone morphogenetic protein (BMP). Here we compare patterns of two related cellular responses, both signal-dependent phosphorylation of the BMP-regulated R-SMAD, MAD, and signal-dependent changes in levels and sub-cellular distribution of the co-SMAD Medea. Our data demonstrate that nuclear accumulation of the co-SMAD Medea requires a BMP signal during blastoderm and gastrula stages. During this period, nuclear co-SMAD responses occur in three distinct patterns. At the end of blastoderm, a broad dorsal domain of weak SMAD response is detected. During early gastrulation, this domain narrows to a thin stripe of strong SMAD response at the dorsal midline. SMAD response levels continue to rise in the dorsal midline region during gastrulation, and flanking plateaus of weak responses are detected in dorsolateral cells. Thus, the thresholds for gene expression responses are implicit in the levels of SMAD responses during gastrulation. Both BMP ligands, DPP and Screw, are required for nuclear co-SMAD responses during these stages. The BMP antagonist Short gastrulation (SOG) is required to elevate peak responses at the dorsal midline as well as to depress responses in dorsolateral cells. The midline SMAD response gradient can form in embryos with reduced dpp gene dosage, but the peak level is reduced. These data support a model in which weak BMP activity during blastoderm defines the boundary between ventral neurogenic ectoderm and dorsal ectoderm. Subsequently, BMP activity creates a step gradient of SMAD responses that patterns the amnioserosa and dorsomedial ectoderm.


Asunto(s)
Tipificación del Cuerpo , Proteínas de Unión al ADN/metabolismo , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/embriología , Transducción de Señal/fisiología , Transactivadores/metabolismo , Factores de Transcripción/metabolismo , Animales , Anticuerpos/metabolismo , Proteínas Morfogenéticas Óseas/genética , Proteínas Morfogenéticas Óseas/metabolismo , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/inmunología , Proteínas de Drosophila/genética , Drosophila melanogaster/anatomía & histología , Drosophila melanogaster/fisiología , Dosificación de Gen , Ligandos , Morfogénesis , Proteínas Recombinantes de Fusión/genética , Proteínas Recombinantes de Fusión/metabolismo , Proteína Smad4 , Transactivadores/genética , Transactivadores/inmunología , Factores de Transcripción/genética
10.
Comp Funct Genomics ; 4(6): 609-25, 2003.
Artículo en Inglés | MEDLINE | ID: mdl-18629027

RESUMEN

We describe the cloning, sequencing and structure of the human fast skeletal troponin T (TNNT3) gene located on chromosome 11p15.5. The single-copy gene encodes 19 exons and 18 introns. Eleven of these exons, 1-3, 9-15 and 18, are constitutively spliced, whereas exons 4-8 are alternatively spliced. The gene contains an additional subset of developmentally regulated and alternatively spliced exons, including a foetal exon located between exon 8 and 9 and exon 16 or alpha (adult) and 17 or beta (foetal and neonatal). Exon phasing suggests that the majority of the alternatively spliced exons located at the 5' end of the gene may have evolved as a result of exon shuffling, because they are of the same phase class. In contrast, the 3' exons encoding an evolutionarily conserved heptad repeat domain, shared by both TnT and troponin I (TnI), may be remnants of an ancient ancestral gene. The sequence of the 5' flanking region shows that the putative promoter contains motifs including binding sites for MyoD, MEF-2 and several transcription factors which may play a role in transcriptional regulation and tissue-specific expression of TnT. The coding region of TNNT3 exhibits strong similarity to the corresponding rat sequence. However, unlike the rat TnT gene, TNNT3 possesses two repeat regions of CCA and TC. The exclusive presence of these repetitive elements in the human gene indicates divergence in the evolutionary dynamics of mammalian TnT genes. Homologous muscle-specific splicing enhancer motifs are present in the introns upstream and downstream of the foetal exon, and may play a role in the developmental pattern of alternative splicing of the gene. The genomic correlates of TNNT3 are relevant to our understanding of the evolution and regulation of expression of the gene, as well as the structure and function of the protein isoforms. The nucleotide sequence of TNNT3 has been submitted to EMBL/GenBank under Accession No. AF026276.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...