Pesquisa | BVS Integralidade em Saúde

1.

Ten simple rules for making training materials FAIR.

Garcia, Leyla; Batut, Bérénice; Burke, Melissa L; Kuzak, Mateusz; Psomopoulos, Fotis; Arcila, Ricardo; Attwood, Teresa K; Beard, Niall; Carvalho-Silva, Denise; Dimopoulos, Alexandros C; Del Angel, Victoria Dominguez; Dumontier, Michel; Gurwitz, Kim T; Krause, Roland; McQuilton, Peter; Le Pera, Loredana; Morgan, Sarah L; Rauste, Päivi; Via, Allegra; Kahlem, Pascal; Rustici, Gabriella; van Gelder, Celia W G; Palagi, Patricia M.

PLoS Comput Biol ; 16(5): e1007854, 2020 05.

Artigo em Inglês | MEDLINE | ID: mdl-32437350

RESUMO

Everything we do today is becoming more and more reliant on the use of computers. The field of biology is no exception; but most biologists receive little or no formal preparation for the increasingly computational aspects of their discipline. In consequence, informal training courses are often needed to plug the gaps; and the demand for such training is growing worldwide. To meet this demand, some training programs are being expanded, and new ones are being developed. Key to both scenarios is the creation of new course materials. Rather than starting from scratch, however, it's sometimes possible to repurpose materials that already exist. Yet finding suitable materials online can be difficult: They're often widely scattered across the internet or hidden in their home institutions, with no systematic way to find them. This is a common problem for all digital objects. The scientific community has attempted to address this issue by developing a set of rules (which have been called the Findable, Accessible, Interoperable and Reusable [FAIR] principles) to make such objects more findable and reusable. Here, we show how to apply these rules to help make training materials easier to find, (re)use, and adapt, for the benefit of all.

Assuntos

Instrução por Computador/normas , Guias como Assunto , Biologia/educação , Biologia Computacional , Humanos , Armazenamento e Recuperação da Informação

2.

FlyBase 102--advanced approaches to interrogating FlyBase.

St Pierre, Susan E; Ponting, Laura; Stefancsik, Raymund; McQuilton, Peter.

Nucleic Acids Res ; 42(Database issue): D780-8, 2014 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-24234449

RESUMO

FlyBase (http://flybase.org) is the leading website and database of Drosophila genes and genomes. Whether you are using the fruit fly Drosophila melanogaster as an experimental system or wish to understand Drosophila biological knowledge in relation to human disease or to other model systems, FlyBase can help you successfully find the information you are looking for. Here, we demonstrate some of our more advanced searching systems and highlight some of our new tools for searching the wealth of data on FlyBase. The first section explores gene function in FlyBase, using our TermLink tool to search with Controlled Vocabulary terms and our new RNA-Seq Search tool to search gene expression. The second section of this article describes a few ways to search genomic data in FlyBase, using our BLAST server and the new implementation of GBrowse 2, as well as our new FeatureMapper tool. Finally, we move on to discuss our most powerful search tool, QueryBuilder, before describing pre-computed cuts of the data and how to query the database programmatically.

Assuntos

Bases de Dados Genéticas , Drosophila/genética , Genoma de Inseto , Animais , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Ontologia Genética , Genes de Insetos , Internet , Fenótipo , Análise de Sequência de RNA

3.

FlyBase 101--the basics of navigating FlyBase.

McQuilton, Peter; St Pierre, Susan E; Thurmond, Jim.

Nucleic Acids Res ; 40(Database issue): D706-14, 2012 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-22127867

RESUMO

FlyBase (http://flybase.org) is the leading database and web portal for genetic and genomic information on the fruit fly Drosophila melanogaster and related fly species. Whether you use the fruit fly as an experimental system or want to apply Drosophila biological knowledge to another field of study, FlyBase can help you successfully navigate the wealth of available Drosophila data. Here, we review the FlyBase web site with novice and less-experienced users of FlyBase in mind and point out recent developments stemming from the availability of genome-wide data from the modENCODE project. The first section of this paper explains the organization of the web site and describes the report pages available on FlyBase, focusing on the most popular, the Gene Report. The next section introduces some of the search tools available on FlyBase, in particular, our heavily used and recently redesigned search tool QuickSearch, found on the FlyBase homepage. The final section concerns genomic data, including recent modENCODE (http://www.modencode.org) data, available through our Genome Browser, GBrowse.

Assuntos

Bases de Dados Genéticas , Drosophila melanogaster/genética , Genoma de Inseto , Animais , Genes de Insetos , Genômica , Internet , Software

4.

FAIR, ethical, and coordinated data sharing for COVID-19 response: a scoping review and cross-sectional survey of COVID-19 data sharing platforms and registries.

Maxwell, Lauren; Shreedhar, Priya; Dauga, Delphine; McQuilton, Peter; Terry, Robert F; Denisiuk, Alisa; Molnar-Gabor, Fruzsina; Saxena, Abha; Sansone, Susanna-Assunta.

Lancet Digit Health ; 5(10): e712-e736, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-37775189

RESUMO

Data sharing is central to the rapid translation of research into advances in clinical medicine and public health practice. In the context of COVID-19, there has been a rush to share data marked by an explosion of population-specific and discipline-specific resources for collecting, curating, and disseminating participant-level data. We conducted a scoping review and cross-sectional survey to identify and describe COVID-19-related platforms and registries that harmonise and share participant-level clinical, omics (eg, genomic and metabolomic data), imaging data, and metadata. We assess how these initiatives map to the best practices for the ethical and equitable management of data and the findable, accessible, interoperable, and reusable (FAIR) principles for data resources. We review gaps and redundancies in COVID-19 data-sharing efforts and provide recommendations to build on existing synergies that align with frameworks for effective and equitable data reuse. We identified 44 COVID-19-related registries and 20 platforms from the scoping review. Data-sharing resources were concentrated in high-income countries and siloed by comorbidity, body system, and data type. Resources for harmonising and sharing clinical data were less likely to implement FAIR principles than those sharing omics or imaging data. Our findings are that more data sharing does not equate to better data sharing, and the semantic and technical interoperability of platforms and registries harmonising and sharing COVID-19-related participant-level data needs to improve to facilitate the global collaboration required to address the COVID-19 crisis.

Assuntos

COVID-19 , Humanos , COVID-19/epidemiologia , Estudos Transversais , Disseminação de Informação/métodos , Sistema de Registros , Metadados

5.

Drosophila neurotrophins reveal a common mechanism for nervous system formation.

Zhu, Bangfu; Pennack, Jenny A; McQuilton, Peter; Forero, Manuel G; Mizuguchi, Kenji; Sutcliffe, Ben; Gu, Chun-Jing; Fenton, Janine C; Hidalgo, Alicia.

PLoS Biol ; 6(11): e284, 2008 Nov 18.

Artigo em Inglês | MEDLINE | ID: mdl-19018662

RESUMO

Neurotrophic interactions occur in Drosophila, but to date, no neurotrophic factor had been found. Neurotrophins are the main vertebrate secreted signalling molecules that link nervous system structure and function: they regulate neuronal survival, targeting, synaptic plasticity, memory and cognition. We have identified a neurotrophic factor in flies, Drosophila Neurotrophin (DNT1), structurally related to all known neurotrophins and highly conserved in insects. By investigating with genetics the consequences of removing DNT1 or adding it in excess, we show that DNT1 maintains neuronal survival, as more neurons die in DNT1 mutants and expression of DNT1 rescues naturally occurring cell death, and it enables targeting by motor neurons. We show that Spätzle and a further fly neurotrophin superfamily member, DNT2, also have neurotrophic functions in flies. Our findings imply that most likely a neurotrophin was present in the common ancestor of all bilateral organisms, giving rise to invertebrate and vertebrate neurotrophins through gene or whole-genome duplications. This work provides a missing link between aspects of neuronal function in flies and vertebrates, and it opens the opportunity to use Drosophila to investigate further aspects of neurotrophin function and to model related diseases.

Assuntos

Proteínas de Drosophila/fisiologia , Drosophila/embriologia , Fatores de Crescimento Neural/fisiologia , Sistema Nervoso/embriologia , Neurônios/metabolismo , Animais , Axônios , Sequência de Bases , Morte Celular , Sequência Conservada , Drosophila/genética , Proteínas de Drosophila/química , Proteínas de Drosophila/genética , Expressão Gênica , Humanos , Locomoção , Fatores de Crescimento Neural/química , Fatores de Crescimento Neural/genética , Neurônios/fisiologia , Análise de Sequência de Proteína

6.

FlyBase: enhancing Drosophila Gene Ontology annotations.

Tweedie, Susan; Ashburner, Michael; Falls, Kathleen; Leyland, Paul; McQuilton, Peter; Marygold, Steven; Millburn, Gillian; Osumi-Sutherland, David; Schroeder, Andrew; Seal, Ruth; Zhang, Haiyan.

Nucleic Acids Res ; 37(Database issue): D555-9, 2009 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-18948289

RESUMO

FlyBase (http://flybase.org) is a database of Drosophila genetic and genomic information. Gene Ontology (GO) terms are used to describe three attributes of wild-type gene products: their molecular function, the biological processes in which they play a role, and their subcellular location. This article describes recent changes to the FlyBase GO annotation strategy that are improving the quality of the GO annotation data. Many of these changes stem from our participation in the GO Reference Genome Annotation Project--a multi-database collaboration producing comprehensive GO annotation sets for 12 diverse species.

Assuntos

Bases de Dados Genéticas , Proteínas de Drosophila/genética , Drosophila/genética , Genes de Insetos , Animais , Genoma de Inseto , Genômica , Vocabulário Controlado

7.

Biocuration - mapping resources and needs.

Holinski, Alexandra; Burke, Melissa L; Morgan, Sarah L; McQuilton, Peter; Palagi, Patricia M.

F1000Res ; 92020.

Artigo em Inglês | MEDLINE | ID: mdl-33145007

RESUMO

Background: Biocuration involves a variety of teams and individuals across the globe. However, they may not self-identify as biocurators, as they may be unaware of biocuration as a career path or because biocuration is only part of their role. The lack of a clear, up-to-date profile of biocuration creates challenges for organisations like ELIXIR, the ISB and GOBLET to systematically support biocurators and for biocurators themselves to develop their own careers. Therefore, the ELIXIR Training Platform launched an Implementation Study in order to i) identify communities of biocurators, ii) map the type of curation work being done, iii) assess biocuration training, and iv) draw a picture of biocuration career development. Methods: To achieve the goals of the study, we carried out a global survey on the nature of biocuration work, the tools and resources that are used, training that has been received and additional training needs. To examine these topics in more detail we ran workshop-based discussions at ISB Biocuration Conference 2019 and the ELIXIR All Hands Meeting 2019. We also had guided conversations with selected people from the EMBL-European Bioinformatics Institute. Results: The study illustrates that biocurators have diverse job titles, are highly skilled, perform a variety of activities and use a wide range of tools and resources. The study emphasises the need for training in programming and coding skills, but also highlights the difficulties curators face in terms of career development and community building. Conclusion: Biocurators themselves, as well as organisations like ELIXIR, GOBLET and ISB must work together towards structural change to overcome these difficulties. In this article we discuss recommendations to ensure that biocuration as a role is visible and valued, thereby helping biocurators to proceed with their career.

Assuntos

Biologia Computacional , Curadoria de Dados/métodos , Mineração de Dados , Humanos

8.

Author Correction: Evaluating FAIR maturity through a scalable, automated, community-governed framework.

Wilkinson, Mark D; Dumontier, Michel; Sansone, Susanna-Assunta; Bonino da Silva Santos, Luiz Olavo; Prieto, Mario; Batista, Dominique; McQuilton, Peter; Kuhn, Tobias; Rocca-Serra, Philippe; Crosas, MercÑ; Schultes, Erik.

Sci Data ; 6(1): 230, 2019 Oct 21.

Artigo em Inglês | MEDLINE | ID: mdl-31636272

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

9.

Evaluating FAIR maturity through a scalable, automated, community-governed framework.

Wilkinson, Mark D; Dumontier, Michel; Sansone, Susanna-Assunta; Bonino da Silva Santos, Luiz Olavo; Prieto, Mario; Batista, Dominique; McQuilton, Peter; Kuhn, Tobias; Rocca-Serra, Philippe; Crosas, MercÑ; Schultes, Erik.

Sci Data ; 6(1): 174, 2019 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-31541130

RESUMO

Transparent evaluations of FAIRness are increasingly required by a wide range of stakeholders, from scientists to publishers, funding agencies and policy makers. We propose a scalable, automatable framework to evaluate digital resources that encompasses measurable indicators, open source tools, and participation guidelines, which come together to accommodate domain relevant community-defined FAIR assessments. The components of the framework are: (1) Maturity Indicators - community-authored specifications that delimit a specific automatically-measurable FAIR behavior; (2) Compliance Tests - small Web apps that test digital resources against individual Maturity Indicators; and (3) the Evaluator, a Web application that registers, assembles, and applies community-relevant sets of Compliance Tests against a digital resource, and provides a detailed report about what a machine "sees" when it visits that resource. We discuss the technical and social considerations of FAIR assessments, and how this translates to our community-driven infrastructure. We then illustrate how the output of the Evaluator tool can serve as a roadmap to assist data stewards to incrementally and realistically improve the FAIRness of their resources.

10.

FAIRshake: Toolkit to Evaluate the FAIRness of Research Digital Resources.

Clarke, Daniel J B; Wang, Lily; Jones, Alex; Wojciechowicz, Megan L; Torre, Denis; Jagodnik, Kathleen M; Jenkins, Sherry L; McQuilton, Peter; Flamholz, Zachary; Silverstein, Moshe C; Schilder, Brian M; Robasky, Kimberly; Castillo, Claris; Idaszak, Ray; Ahalt, Stanley C; Williams, Jason; Schurer, Stephan; Cooper, Daniel J; de Miranda Azevedo, Ricardo; Klenk, Juergen A; Haendel, Melissa A; Nedzel, Jared; Avillach, Paul; Shimoyama, Mary E; Harris, Rayna M; Gamble, Meredith; Poten, Rudy; Charbonneau, Amanda L; Larkin, Jennie; Brown, C Titus; Bonazzi, Vivien R; Dumontier, Michel J; Sansone, Susanna-Assunta; Ma'ayan, Avi.

Cell Syst ; 9(5): 417-421, 2019 11 27.

Artigo em Inglês | MEDLINE | ID: mdl-31677972

RESUMO

As more digital resources are produced by the research community, it is becoming increasingly important to harmonize and organize them for synergistic utilization. The findable, accessible, interoperable, and reusable (FAIR) guiding principles have prompted many stakeholders to consider strategies for tackling this challenge. The FAIRshake toolkit was developed to enable the establishment of community-driven FAIR metrics and rubrics paired with manual and automated FAIR assessments. FAIR assessments are visualized as an insignia that can be embedded within digital-resources-hosting websites. Using FAIRshake, a variety of biomedical digital resources were manually and automatically evaluated for their level of FAIRness.

Assuntos

Disseminação de Informação/métodos , Internet/tendências , Sistemas On-Line/normas , Recursos em Saúde/normas , Humanos

11.

Natural language processing in aid of FlyBase curators.

Karamanis, Nikiforos; Seal, Ruth; Lewin, Ian; McQuilton, Peter; Vlachos, Andreas; Gasperin, Caroline; Drysdale, Rachel; Briscoe, Ted.

BMC Bioinformatics ; 9: 193, 2008 Apr 14.

Artigo em Inglês | MEDLINE | ID: mdl-18410678

RESUMO

BACKGROUND: Despite increasing interest in applying Natural Language Processing (NLP) to biomedical text, whether this technology can facilitate tasks such as database curation remains unclear. RESULTS: PaperBrowser is the first NLP-powered interface that was developed under a user-centered approach to improve the way in which FlyBase curators navigate an article. In this paper, we first discuss how observing curators at work informed the design and evaluation of PaperBrowser. Then, we present how we appraise PaperBrowser's navigational functionalities in a user-based study using a text highlighting task and evaluation criteria of Human-Computer Interaction. Our results show that PaperBrowser reduces the amount of interactions between two highlighting events and therefore improves navigational efficiency by about 58% compared to the navigational mechanism that was previously available to the curators. Moreover, PaperBrowser is shown to provide curators with enhanced navigational utility by over 74% irrespective of the different ways in which they highlight text in the article. CONCLUSION: We show that state-of-the-art performance in certain NLP tasks such as Named Entity Recognition and Anaphora Resolution can be combined with the navigational functionalities of PaperBrowser to support curation quite successfully.

Assuntos

Inteligência Artificial , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Bibliográficas , Processamento de Linguagem Natural , Publicações Periódicas como Assunto , Software , Vocabulário Controlado , Algoritmos , Armazenamento e Recuperação da Informação/métodos

12.

BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences.

McQuilton, Peter; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Thurston, Milo; Lister, Allyson; Maguire, Eamonn; Sansone, Susanna-Assunta.

Database (Oxford) ; 20162016.

Artigo em Inglês | MEDLINE | ID: mdl-27189610

RESUMO

BioSharing (http://www.biosharing.org) is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and models, and reporting guidelines), databases, and data policies in the life sciences, broadly encompassing the biological, environmental and biomedical sciences. Launched in 2011 and built by the same core team as the successful MIBBI portal, BioSharing harnesses community curation to collate and cross-reference resources across the life sciences from around the world. BioSharing makes these resources findable and accessible (the core of the FAIR principle). Every record is designed to be interlinked, providing a detailed description not only on the resource itself, but also on its relations with other life science infrastructures. Serving a variety of stakeholders, BioSharing cultivates a growing community, to which it offers diverse benefits. It is a resource for funding bodies and journal publishers to navigate the metadata landscape of the biological sciences; an educational resource for librarians and information advisors; a publicising platform for standard and database developers/curators; and a research tool for bench and computer scientists to plan their work. BioSharing is working with an increasing number of journals and other registries, for example linking standards and databases to training material and tools. Driven by an international Advisory Board, the BioSharing user-base has grown by over 40% (by unique IP address), in the last year thanks to successful engagement with researchers, publishers, librarians, developers and other stakeholders via several routes, including a joint RDA/Force11 working group and a collaboration with the International Society for Biocuration. In this article, we describe BioSharing, with a particular focus on community-led curation.Database URL: https://www.biosharing.org.

Assuntos

Disciplinas das Ciências Biológicas , Crowdsourcing/normas , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Metadados/normas , Disciplinas das Ciências Biológicas/legislação & jurisprudência , Disciplinas das Ciências Biológicas/normas , Biologia Computacional , Sistemas de Gerenciamento de Base de Dados/legislação & jurisprudência , Sistemas de Gerenciamento de Base de Dados/normas , Bases de Dados Factuais/legislação & jurisprudência , Bases de Dados Factuais/normas , Humanos , Internet , Sistema de Registros/normas , Interface Usuário-Computador

13.

Overview of the interactive task in BioCreative V.

Wang, Qinghua; S Abdul, Shabbir; Almeida, Lara; Ananiadou, Sophia; Balderas-Martínez, Yalbi I; Batista-Navarro, Riza; Campos, David; Chilton, Lucy; Chou, Hui-Jou; Contreras, Gabriela; Cooper, Laurel; Dai, Hong-Jie; Ferrell, Barbra; Fluck, Juliane; Gama-Castro, Socorro; George, Nancy; Gkoutos, Georgios; Irin, Afroza K; Jensen, Lars J; Jimenez, Silvia; Jue, Toni R; Keseler, Ingrid; Madan, Sumit; Matos, Sérgio; McQuilton, Peter; Milacic, Marija; Mort, Matthew; Natarajan, Jeyakumar; Pafilis, Evangelos; Pereira, Emiliano; Rao, Shruti; Rinaldi, Fabio; Rothfels, Karen; Salgado, David; Silva, Raquel M; Singh, Onkar; Stefancsik, Raymund; Su, Chu-Hsien; Subramani, Suresh; Tadepally, Hamsa D; Tsaprouni, Loukia; Vasilevsky, Nicole; Wang, Xiaodong; Chatr-Aryamontri, Andrew; Laulederkind, Stanley J F; Matis-Mitchell, Sherri; McEntyre, Johanna; Orchard, Sandra; Pundir, Sangya; Rodriguez-Esteban, Raul.

Database (Oxford) ; 20162016.

Artigo em Inglês | MEDLINE | ID: mdl-27589961

RESUMO

Fully automated text mining (TM) systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These systems are not meant to replace biocurators, but instead to assist them in one or more literature curation steps. To do so, the user interface is an important aspect that needs to be considered for tool adoption. The BioCreative Interactive task (IAT) is a track designed for exploring user-system interactions, promoting development of useful TM tools, and providing a communication channel between the biocuration and the TM communities. In BioCreative V, the IAT track followed a format similar to previous interactive tracks, where the utility and usability of TM tools, as well as the generation of use cases, have been the focal points. The proposed curation tasks are user-centric and formally evaluated by biocurators. In BioCreative V IAT, seven TM systems and 43 biocurators participated. Two levels of user participation were offered to broaden curator involvement and obtain more feedback on usability aspects. The full level participation involved training on the system, curation of a set of documents with and without TM assistance, tracking of time-on-task, and completion of a user survey. The partial level participation was designed to focus on usability aspects of the interface and not the performance per se In this case, biocurators navigated the system by performing pre-designed tasks and then were asked whether they were able to achieve the task and the level of difficulty in completing the task. In this manuscript, we describe the development of the interactive task, from planning to execution and discuss major findings for the systems tested.Database URL: http://www.biocreative.org.

Assuntos

Curadoria de Dados/métodos , Mineração de Dados/métodos , Processamento Eletrônico de Dados/métodos

14.

FAIRsharing as a community approach to standards, repositories and policies.

Sansone, Susanna-Assunta; McQuilton, Peter; Rocca-Serra, Philippe; Gonzalez-Beltran, Alejandra; Izzo, Massimiliano; Lister, Allyson L; Thurston, Milo.

Nat Biotechnol ; 37(4): 358-367, 2019 04.

Artigo em Inglês | MEDLINE | ID: mdl-30940948

Assuntos

Biotecnologia/normas , Disseminação de Informação/métodos , Participação da Comunidade/métodos , Bases de Dados Factuais/normas , Humanos , Comunicação Acadêmica/normas

15.

tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles.

Cejuela, Juan Miguel; McQuilton, Peter; Ponting, Laura; Marygold, Steven J; Stefancsik, Raymund; Millburn, Gillian H; Rost, Burkhard.

Database (Oxford) ; 2014(0): bau033, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24715220

RESUMO

The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we present the 'tagtog' system, a web-based annotation framework that can be used to mark up biological entities (such as genes) and concepts (such as Gene Ontology terms) in full-text articles. tagtog leverages manual user annotation in combination with automatic machine-learned annotation to provide accurate identification of gene symbols and gene names. As part of the BioCreative IV Interactive Annotation Task, FlyBase has used tagtog to identify and extract mentions of Drosophila melanogaster gene symbols and names in full-text biomedical articles from the PLOS stable of journals. We show here the results of three experiments with different sized corpora and assess gene recognition performance and curation speed. We conclude that tagtog-named entity recognition improves with a larger corpus and that tagtog-assisted curation is quicker than manual curation. DATABASE URL: www.tagtog.net, www.flybase.org.

Assuntos

Mineração de Dados/métodos , Anotação de Sequência Molecular/métodos , Animais , Drosophila/genética , Internet , Software , Interface Usuário-Computador , Vocabulário Controlado

16.

BC4GO: a full-text corpus for the BioCreative IV GO task.

Van Auken, Kimberly; Schaeffer, Mary L; McQuilton, Peter; Laulederkind, Stanley J F; Li, Donghui; Wang, Shur-Jen; Hayman, G Thomas; Tweedie, Susan; Arighi, Cecilia N; Done, James; Müller, Hans-Michael; Sternberg, Paul W; Mao, Yuqing; Wei, Chih-Hsuan; Lu, Zhiyong.

Database (Oxford) ; 20142014.

Artigo em Inglês | MEDLINE | ID: mdl-25070993

RESUMO

Gene function curation via Gene Ontology (GO) annotation is a common task among Model Organism Database groups. Owing to its manual nature, this task is considered one of the bottlenecks in literature curation. There have been many previous attempts at automatic identification of GO terms and supporting information from full text. However, few systems have delivered an accuracy that is comparable with humans. One recognized challenge in developing such systems is the lack of marked sentence-level evidence text that provides the basis for making GO annotations. We aim to create a corpus that includes the GO evidence text along with the three core elements of GO annotations: (i) a gene or gene product, (ii) a GO term and (iii) a GO evidence code. To ensure our results are consistent with real-life GO data, we recruited eight professional GO curators and asked them to follow their routine GO annotation protocols. Our annotators marked up more than 5000 text passages in 200 articles for 1356 distinct GO terms. For evidence sentence selection, the inter-annotator agreement (IAA) results are 9.3% (strict) and 42.7% (relaxed) in F1-measures. For GO term selection, the IAAs are 47% (strict) and 62.9% (hierarchical). Our corpus analysis further shows that abstracts contain â¼ 10% of relevant evidence sentences and 30% distinct GO terms, while the Results/Experiment section has nearly 60% relevant sentences and >70% GO terms. Further, of those evidence sentences found in abstracts, less than one-third contain enough experimental detail to fulfill the three core criteria of a GO annotation. This result demonstrates the need of using full-text articles for text mining GO annotations. Through its use at the BioCreative IV GO (BC4GO) task, we expect our corpus to become a valuable resource for the BioNLP research community. Database URL: http://www.biocreative.org/resources/corpora/bc-iv-go-task-corpus/.

Assuntos

Mineração de Dados/métodos , Bases de Dados Genéticas , Anotação de Sequência Molecular , Software , Vocabulário Controlado , Biologia Computacional/métodos , Humanos

17.

Overview of the gene ontology task at BioCreative IV.

Mao, Yuqing; Van Auken, Kimberly; Li, Donghui; Arighi, Cecilia N; McQuilton, Peter; Hayman, G Thomas; Tweedie, Susan; Schaeffer, Mary L; Laulederkind, Stanley J F; Wang, Shur-Jen; Gobeill, Julien; Ruch, Patrick; Luu, Anh Tuan; Kim, Jung-Jae; Chiang, Jung-Hsien; Chen, Yu-De; Yang, Chia-Jung; Liu, Hongfang; Zhu, Dongqing; Li, Yanpeng; Yu, Hong; Emadzadeh, Ehsan; Gonzalez, Graciela; Chen, Jian-Ming; Dai, Hong-Jie; Lu, Zhiyong.

Database (Oxford) ; 20142014.

Artigo em Inglês | MEDLINE | ID: mdl-25157073

RESUMO

Gene ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation. DATABASE URL: http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/.

Assuntos

Biologia Computacional/métodos , Mineração de Dados , Ontologia Genética , Anotação de Sequência Molecular/métodos , Algoritmos , Humanos , Reprodutibilidade dos Testes

18.

The Drosophila phenotype ontology.

Osumi-Sutherland, David; Marygold, Steven J; Millburn, Gillian H; McQuilton, Peter A; Ponting, Laura; Stefancsik, Raymund; Falls, Kathleen; Brown, Nicholas H; Gkoutos, Georgios V.

J Biomed Semantics ; 4(1): 30, 2013 Oct 18.

Artigo em Inglês | MEDLINE | ID: mdl-24138933

RESUMO

BACKGROUND: Phenotype ontologies are queryable classifications of phenotypes. They provide a widely-used means for annotating phenotypes in a form that is human-readable, programatically accessible and that can be used to group annotations in biologically meaningful ways. Accurate manual annotation requires clear textual definitions for terms. Accurate grouping and fruitful programatic usage require high-quality formal definitions that can be used to automate classification. The Drosophila phenotype ontology (DPO) has been used to annotate over 159,000 phenotypes in FlyBase to date, but until recently lacked textual or formal definitions. RESULTS: We have composed textual definitions for all DPO terms and formal definitions for 77% of them. Formal definitions reference terms from a range of widely-used ontologies including the Phenotype and Trait Ontology (PATO), the Gene Ontology (GO) and the Cell Ontology (CL). We also describe a generally applicable system, devised for the DPO, for recording and reasoning about the timing of death in populations. As a result of the new formalisations, 85% of classifications in the DPO are now inferred rather than asserted, with much of this classification leveraging the structure of the GO. This work has significantly improved the accuracy and completeness of classification and made further development of the DPO more sustainable. CONCLUSIONS: The DPO provides a set of well-defined terms for annotating Drosophila phenotypes and for grouping and querying the resulting annotation sets in biologically meaningful ways. Such queries have already resulted in successful function predictions from phenotype annotation. Moreover, such formalisations make extended queries possible, including cross-species queries via the external ontologies used in formal definitions. The DPO is openly available under an open source license in both OBO and OWL formats. There is good potential for it to be used more broadly by the Drosophila community, which may ultimately result in its extension to cover a broader range of phenotypes.

19.

Opportunities for text mining in the FlyBase genetic literature curation workflow.

McQuilton, Peter.

Database (Oxford) ; 2012: bas039, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-23160412

RESUMO

FlyBase is the model organism database for Drosophila genetic and genomic information. Over the last 20 years, FlyBase has had to adapt and change to keep abreast of advances in biology and database design. We are continually looking for ways to improve curation efficiency and efficacy. Genetic literature curation focuses on the extraction of genetic entities (e.g. genes, mutant alleles, transgenic constructs) and their associated phenotypes and Gene Ontology terms from the published literature. Over 2000 Drosophila research articles are now published every year. These articles are becoming ever more data-rich and there is a growing need for text mining to shoulder some of the burden of paper triage and data extraction. In this article, we describe our curation workflow, along with some of the problems and bottlenecks therein, and highlight the opportunities for text mining. We do so in the hope of encouraging the BioCreative community to help us to develop effective methods to mine this torrent of information. DATABASE URL: http://flybase.org

Assuntos

Mineração de Dados/métodos , Bases de Dados Genéticas , Drosophila/genética , Fluxo de Trabalho , Animais , Fenótipo

20.

Inside FlyBase: biocuration as a career.

St Pierre, Susan; McQuilton, Peter.

Fly (Austin) ; 3(1): 112-4, 2009.

Artigo em Inglês | MEDLINE | ID: mdl-19182544

RESUMO

As research in the biological sciences continues to advance at a rapid pace, it is increasingly important that the data be captured, standardized, organized and made accessible to the scientific community. This is the job of a biocurator. Here we describe the process of biocuration from our perspective as FlyBase curators.

Assuntos

Bases de Dados Factuais , Drosophila , Animais , Escolha da Profissão , Coleta de Dados , Bases de Dados Genéticas , Internet

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa