RESUMEN
Human genomics is undergoing a step change from being a predominantly research-driven activity to one driven through health care as many countries in Europe now have nascent precision medicine programmes. To maximize the value of the genomic data generated, these data will need to be shared between institutions and across countries. In recognition of this challenge, 21 European countries recently signed a declaration to transnationally share data on at least 1 million human genomes by 2022. In this Roadmap, we identify the challenges of data sharing across borders and demonstrate that European research infrastructures are well-positioned to support the rapid implementation of widespread genomic data access.
Asunto(s)
Investigación Biomédica , Genoma Humano , Proyecto Genoma Humano , Europa (Continente) , HumanosRESUMEN
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMEN
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Disciplinas de las Ciencias Biológicas , Biología ComputacionalRESUMEN
Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) are now recognised as major determinants in cellular regulation. This white paper presents a roadmap for future e-infrastructure developments in the field of IDP research within the ELIXIR framework. The goal of these developments is to drive the creation of high-quality tools and resources to support the identification, analysis and functional characterisation of IDPs. The roadmap is the result of a workshop titled "An intrinsically disordered protein user community proposal for ELIXIR" held at the University of Padua. The workshop, and further consultation with the members of the wider IDP community, identified the key priority areas for the roadmap including the development of standards for data annotation, storage and dissemination; integration of IDP data into the ELIXIR Core Data Resources; and the creation of benchmarking criteria for IDP-related software. Here, we discuss these areas of priority, how they can be implemented in cooperation with the ELIXIR platforms, and their connections to existing ELIXIR Communities and international consortia. The article provides a preliminary blueprint for an IDP Community in ELIXIR and is an appeal to identify and involve new stakeholders.
Asunto(s)
Proteínas Intrínsecamente Desordenadas/metabolismoRESUMEN
BACKGROUND: Despite increasing interest in applying Natural Language Processing (NLP) to biomedical text, whether this technology can facilitate tasks such as database curation remains unclear. RESULTS: PaperBrowser is the first NLP-powered interface that was developed under a user-centered approach to improve the way in which FlyBase curators navigate an article. In this paper, we first discuss how observing curators at work informed the design and evaluation of PaperBrowser. Then, we present how we appraise PaperBrowser's navigational functionalities in a user-based study using a text highlighting task and evaluation criteria of Human-Computer Interaction. Our results show that PaperBrowser reduces the amount of interactions between two highlighting events and therefore improves navigational efficiency by about 58% compared to the navigational mechanism that was previously available to the curators. Moreover, PaperBrowser is shown to provide curators with enhanced navigational utility by over 74% irrespective of the different ways in which they highlight text in the article. CONCLUSION: We show that state-of-the-art performance in certain NLP tasks such as Named Entity Recognition and Anaphora Resolution can be combined with the navigational functionalities of PaperBrowser to support curation quite successfully.
Asunto(s)
Inteligencia Artificial , Sistemas de Administración de Bases de Datos , Bases de Datos Bibliográficas , Procesamiento de Lenguaje Natural , Publicaciones Periódicas como Asunto , Programas Informáticos , Vocabulario Controlado , Algoritmos , Almacenamiento y Recuperación de la Información/métodosRESUMEN
FlyBase ( http://flybase.org ) is the primary database of integrated genetic and genomic data about the Drosophilidae, of which Drosophila melanogaster is the most extensively studied species. Information in FlyBase originates from a variety of sources ranging from large-scale genome projects to the primary research literature. Data-types include sequence-level gene models, molecular classification of gene product functions, mutant phenotypes, mutant lesions and chromosome aberrations, gene expression patterns, transgene insertions, and anatomical images. Query tools allow interrogation of FlyBase through DNA or protein sequence, by gene or mutant name, or through terms from the several ontologies used to capture functional, phenotypic, and anatomical data. Links between FlyBase and external databases provide extensive opportunity for extending exploration into other model organism databases and resources of biological and molecular information. This review will introduce the FlyBase web server and query tools.
Asunto(s)
Bases de Datos Factuales , Drosophila melanogaster/genética , Drosophila melanogaster/fisiología , Algoritmos , Animales , Biología Computacional/métodos , Bases de Datos Genéticas , Proteínas de Drosophila/genética , Genes de Insecto , Genoma de los Insectos , Genómica/instrumentación , Genómica/métodos , Internet , Fenotipo , Programas Informáticos , Interfaz Usuario-ComputadorRESUMEN
FlyBase (http://flybase.org) is the primary repository of genetic and molecular data of the insect family Drosophilidae. For the most extensively studied species, Drosophila melanogaster, a wide range of data are presented in integrated formats. Data types include mutant phenotypes, molecular characterization of mutant alleles and aberrations, cytological maps, wild-type expression patterns, anatomical images, transgenic constructs and insertions, sequence-level gene models and molecular classification of gene product functions. There is a growing body of data for other Drosophila species; this is expected to increase dramatically over the next year, with the completion of draft-quality genomic sequences of an additional 11 Drosophila species.
Asunto(s)
Bases de Datos Genéticas , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Drosophila/genética , Animales , Mapeo Cromosómico , Proteínas de Drosophila/química , Proteínas de Drosophila/fisiología , Drosophila melanogaster/fisiología , Genes de Insecto , Genoma , Modelos GenéticosRESUMEN
Applying Natural Language Processing techniques to biomedical text as a potential aid to curation has become the focus of intensive research. However, developing integrated systems which address the curators' real-world needs has been studied less rigorously. This paper addresses this question and presents generic tools developed to assist FlyBase curators. We discuss how they have been integrated into the curation workflow and present initial evidence about their effectiveness.
Asunto(s)
Bases de Datos Genéticas , Drosophila/genética , Procesamiento de Lenguaje Natural , Animales , Biología Computacional , Proyectos Piloto , Programas InformáticosRESUMEN
The sequence and genome annotations of Drosophila melanogaster were initially published in late 1999 and early 2000. Since then, the Berkeley Drosophila Genome Project (BDGP) and FlyBase have improved the quality of the sequence and reviewed the annotations by hand, respectively, to produce an account of the fruit fly genome that is of the highest quality. This review discusses the main features of this process, both from the point of view of the biology revealed in the end result and in the development of software that has been central to this genome sequencing and annotation project.
Asunto(s)
Drosophila melanogaster/genética , Genes de Insecto , Genoma , Análisis de Secuencia de ADN/métodos , Animales , Bases de Datos Genéticas , Bases de Datos de ProteínasRESUMEN
We have begun a genetic analysis to dissect the process of myogenesis by surveying the X chromosome of Drosophila melanogaster for mutations that affect embryonic muscle development. Using polarised light microscopy and antibody staining techniques we analysed embryos hemizygous for a series of 67 deletion mutations that together cover an estimated 85% of the X chromosome, or 16.5% of the genome. Whereas the mature wild type embryo has a regular array of contractile muscles that insert into the epidermis, 31 of the deletion mutants have defects in muscle pattern, contractility or both, that cannot be attributed simply to epidermal defects and identify functions required for wild type muscle development. We have defined mutant pattern phenotypes that can be described in terms of muscle absences, incomplete myoblast fusion, failure of attachment of the muscle to the epidermis or mispositioning of attachment sites. Thus muscle development can be mutationally disrupted in characteristic and interpretable ways. The areas of overlap of the 31 deletions define 19 regions of the X chromosome that include genes whose products are essential for various aspects of myogenesis. We conclude that our screen can usefully identify loci coding for gene products essential in muscle development.
RESUMEN
BACKGROUND: The recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation rules, requiring traceable evidence for each gene model, and comparing each predicted peptide to SWISS-PROT and TrEMBL sequences. RESULTS: Although the number of predicted protein-coding genes in Drosophila remains essentially unchanged, the revised annotation significantly improves gene models, resulting in structural changes to 85% of the transcripts and 45% of the predicted proteins. We annotated transposable elements and non-protein-coding RNAs as new features, and extended the annotation of untranslated (UTR) sequences and alternative transcripts to include more than 70% and 20% of genes, respectively. Finally, cDNA sequence provided evidence for dicistronic transcripts, neighboring genes with overlapping UTRs on the same DNA sequence strand, alternatively spliced genes that encode distinct, non-overlapping peptides, and numerous nested genes. CONCLUSIONS: Identification of so many unusual gene models not only suggests that some mechanisms for gene regulation are more prevalent than previously believed, but also underscores the complex challenges of eukaryotic gene prediction. At present, experimental data and human curation remain essential to generate high-quality genome annotations.