Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Nucleic Acids Res ; 52(D1): D10-D17, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-38015445

RESUMO

The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the latest developments in the services provided by EMBL-EBI data resources to scientific communities globally. These developments aim to ensure EMBL-EBI resources meet the current and future needs of these scientific communities, accelerating the impact of open biological data for all.


Assuntos
Academias e Institutos , Biologia Computacional , Biologia Computacional/organização & administração , Biologia Computacional/tendências , Academias e Institutos/organização & administração , Academias e Institutos/tendências , Bases de Dados de Ácidos Nucleicos , Europa (Continente)
2.
Nucleic Acids Res ; 51(D1): D9-D17, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36477213

RESUMO

The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the status of services that EMBL-EBI data resources provide to scientific communities globally. The scale, openness, rich metadata and extensive curation of EMBL-EBI added-value databases makes them particularly well-suited as training sets for deep learning, machine learning and artificial intelligence applications, a selection of which are described here. The data resources at EMBL-EBI can catalyse such developments because they offer sustainable, high-quality data, collected in some cases over decades and made openly availability to any researcher, globally. Our aim is for EMBL-EBI data resources to keep providing the foundations for tools and research insights that transform fields across the life sciences.


Assuntos
Inteligência Artificial , Biologia Computacional , Gerenciamento de Dados , Bases de Dados Factuais , Genoma , Internet
3.
Nucleic Acids Res ; 50(D1): D11-D19, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34850134

RESUMO

The European Bioinformatics Institute (EMBL-EBI) maintains a comprehensive range of freely available and up-to-date molecular data resources, which includes over 40 resources covering every major data type in the life sciences. This year's service update for EMBL-EBI includes new resources, PGS Catalog and AlphaFold DB, and updates on existing resources, including the COVID-19 Data Platform, trRosetta and RoseTTAfold models introduced in Pfam and InterPro, and the launch of Genome Integrations with Function and Sequence by UniProt and Ensembl. Furthermore, we highlight projects through which EMBL-EBI has contributed to the development of community-driven data standards and guidelines, including the Recommended Metadata for Biological Images (REMBI), and the BioModels Reproducibility Scorecard. Training is one of EMBL-EBI's core missions and a key component of the provision of bioinformatics services to users: this year's update includes many of the improvements that have been developed to EMBL-EBI's online training offering.


Assuntos
Biologia Computacional/educação , Biologia Computacional/métodos , Bases de Dados Factuais , Academias e Institutos , Inteligência Artificial , COVID-19 , Bases de Dados Factuais/economia , Bases de Dados Factuais/estatística & dados numéricos , Bases de Dados de Produtos Farmacêuticos , Bases de Dados de Proteínas , Europa (Continente) , Genoma Humano , Humanos , Armazenamento e Recuperação da Informação , RNA não Traduzido/genética , SARS-CoV-2/genética
4.
Nucleic Acids Res ; 49(D1): D29-D37, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33245775

RESUMO

The European Bioinformatics Institute (EMBL-EBI; https://www.ebi.ac.uk/) provides freely available data and bioinformatics services to the scientific community, alongside its research activity and training provision. The 2020 COVID-19 pandemic has brought to the forefront a need for the scientific community to work even more cooperatively to effectively tackle a global health crisis. EMBL-EBI has been able to build on its position to contribute to the fight against COVID-19 in a number of ways. Firstly, EMBL-EBI has used its infrastructure, expertise and network of international collaborations to help build the European COVID-19 Data Platform (https://www.covid19dataportal.org/), which brings together COVID-19 biomolecular data and connects it to researchers, clinicians and public health professionals. By September 2020, the COVID-19 Data Platform has integrated in excess of 170 000 COVID-19 biomolecular data and literature records, collected through a number of EMBL-EBI resources. Secondly, EMBL-EBI has strived to continue its support of the life science communities through the crisis, with updated Training provision and improved service provision throughout its resources. The COVID-19 pandemic has highlighted the importance of EMBL-EBI's core principles, including international cooperation, resource sharing and central data brokering, and has further empowered scientific cooperation.


Assuntos
COVID-19/prevenção & controle , Biologia Computacional/estatística & dados numéricos , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Armazenamento e Recuperação da Informação/métodos , SARS-CoV-2/genética , Proteínas Virais/genética , COVID-19/epidemiologia , COVID-19/virologia , Biologia Computacional/métodos , Biologia Computacional/organização & administração , Bases de Dados de Ácidos Nucleicos/organização & administração , Saúde Global , Humanos , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Internet , Pandemias , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Proteínas Virais/metabolismo
5.
Nucleic Acids Res ; 49(D1): D1507-D1514, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33180112

RESUMO

Europe PMC (https://europepmc.org) is a database of research articles, including peer reviewed full text articles and abstracts, and preprints - all freely available for use via website, APIs and bulk download. This article outlines new developments since 2017 where work has focussed on three key areas: (i) Europe PMC has added to its core content to include life science preprint abstracts and a special collection of full text of COVID-19-related preprints. Europe PMC is unique as an aggregator of biomedical preprints alongside peer-reviewed articles, with over 180 000 preprints available to search. (ii) Europe PMC has significantly expanded its links to content related to the publications, such as links to Unpaywall, providing wider access to full text, preprint peer-review platforms, all major curated data resources in the life sciences, and experimental protocols. The redesigned Europe PMC website features the PubMed abstract and corresponding PMC full text merged into one article page; there is more evident and user-friendly navigation within articles and to related content, plus a figure browse feature. (iii) The expanded annotations platform offers ∼1.3 billion text mined biological terms and concepts sourced from 10 providers and over 40 global data resources.


Assuntos
Disciplinas das Ciências Biológicas/estatística & dados numéricos , COVID-19/prevenção & controle , Curadoria de Dados/estatística & dados numéricos , Mineração de Dados/estatística & dados numéricos , Bases de Dados Factuais/estatística & dados numéricos , PubMed , SARS-CoV-2/isolamento & purificação , Disciplinas das Ciências Biológicas/métodos , Pesquisa Biomédica/métodos , Pesquisa Biomédica/estatística & dados numéricos , COVID-19/epidemiologia , COVID-19/virologia , Curadoria de Dados/métodos , Mineração de Dados/métodos , Epidemias , Europa (Continente) , Humanos , Internet , SARS-CoV-2/fisiologia
7.
PLoS Biol ; 15(6): e2001414, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28662064

RESUMO

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.


Assuntos
Disciplinas das Ciências Biológicas/métodos , Biologia Computacional/métodos , Mineração de Dados/métodos , Design de Software , Software , Disciplinas das Ciências Biológicas/estatística & dados numéricos , Disciplinas das Ciências Biológicas/tendências , Biologia Computacional/tendências , Mineração de Dados/estatística & dados numéricos , Mineração de Dados/tendências , Bases de Dados Factuais/estatística & dados numéricos , Bases de Dados Factuais/tendências , Previsões , Humanos , Internet
8.
Nucleic Acids Res ; 46(D1): D1266-D1270, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29069414

RESUMO

BioStudies (www.ebi.ac.uk/biostudies) is a new public database that organizes data from biological studies. Typically, but not exclusively, a study is associated with a publication. BioStudies offers a simple way to describe the study structure, and provides flexible data deposition tools and data access interfaces. The actual data can be stored either in BioStudies or remotely, or both. BioStudies imports supplementary data from Europe PMC, and is a resource for authors and publishers for packaging data during the manuscript preparation process. It also can support data management needs of collaborative projects. The growth in multiomics experiments and other multi-faceted approaches to life sciences research mean that studies result in a diversity of data outputs in multiple locations. BioStudies presents a solution to ensuring that all these data and the associated publication(s) can be found coherently in the longer term.


Assuntos
Disciplinas das Ciências Biológicas , Bases de Dados Factuais , Animais , Humanos , Internet , Software
9.
Nucleic Acids Res ; 46(D1): D1254-D1260, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29161421

RESUMO

Europe PMC (https://europepmc.org) is a comprehensive resource of biomedical research publications that offers advanced tools for search, retrieval, and interaction with the scientific literature. This article outlines new developments since 2014. In addition to delivering the core database and services, Europe PMC focuses on three areas of development: individual user services, data integration, and infrastructure to support text and data mining. Europe PMC now provides user accounts to save search queries and claim publications to ORCIDs, as well as open access profiles for authors based on public ORCID records. We continue to foster connections between scientific data and literature in a number of ways. All the data behind the paper - whether in structured archives, generic archives or as supplemental files - are now available via links to the BioStudies database. Text-mined biological concepts, including database accession numbers and data DOIs, are highlighted in the text and linked to the appropriate data resources. The SciLite community annotation platform accepts text-mining results from various contributors and overlays them on research articles as licence allows. In addition, text miners and developers can access all open content via APIs or via the FTP site.


Assuntos
Pesquisa Biomédica , Bases de Dados Bibliográficas , Mineração de Dados , Internet , Publicações Seriadas , Interface Usuário-Computador
10.
Nucleic Acids Res ; 45(D1): D985-D994, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899665

RESUMO

We have designed and developed a data integration and visualization platform that provides evidence about the association of known and potential drug targets with diseases. The platform is designed to support identification and prioritization of biological targets for follow-up. Each drug target is linked to a disease using integrated genome-wide data from a broad range of data sources. The platform provides either a target-centric workflow to identify diseases that may be associated with a specific target, or a disease-centric workflow to identify targets that may be associated with a specific disease. Users can easily transition between these target- and disease-centric workflows. The Open Targets Validation Platform is accessible at https://www.targetvalidation.org.


Assuntos
Biologia Computacional/métodos , Terapia de Alvo Molecular , Ferramenta de Busca , Software , Bases de Dados Factuais , Humanos , Terapia de Alvo Molecular/métodos , Reprodutibilidade dos Testes , Navegador , Fluxo de Trabalho
11.
BMC Bioinformatics ; 14: 104, 2013 Mar 22.
Artigo em Inglês | MEDLINE | ID: mdl-23517090

RESUMO

BACKGROUND: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB. RESULTS: The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments. CONCLUSIONS: The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/.


Assuntos
Mineração de Dados/métodos , Bases de Dados de Proteínas , Bases de Conhecimento , Processamento de Proteína Pós-Traducional , Humanos , Anotação de Sequência Molecular , Proteômica
12.
Nucleic Acids Res ; 39(Database issue): D58-65, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21062818

RESUMO

UK PubMed Central (UKPMC) is a full-text article database that extends the functionality of the original PubMed Central (PMC) repository. The UKPMC project was launched as the first 'mirror' site to PMC, which in analogy to the International Nucleotide Sequence Database Collaboration, aims to provide international preservation of the open and free-access biomedical literature. UKPMC (http://ukpmc.ac.uk) has undergone considerable development since its inception in 2007 and now includes both a UKPMC and PubMed search, as well as access to other records such as Agricola, Patents and recent biomedical theses. UKPMC also differs from PubMed/PMC in that the full text and abstract information can be searched in an integrated manner from one input box. Furthermore, UKPMC contains 'Cited By' information as an alternative way to navigate the literature and has incorporated text-mining approaches to semantically enrich content and integrate it with related database resources. Finally, UKPMC also offers added-value services (UKPMC+) that enable grantees to deposit manuscripts, link papers to grants, publish online portfolios and view citation information on their papers. Here we describe UKPMC and clarify the relationship between PMC and UKPMC, providing historical context and future directions, 10 years on from when PMC was first launched.


Assuntos
PubMed , Mineração de Dados , Internet , Software , Reino Unido
13.
Sci Data ; 10(1): 722, 2023 10 19.
Artigo em Inglês | MEDLINE | ID: mdl-37857688

RESUMO

Named entity recognition (NER) is a widely used text-mining and natural language processing (NLP) subtask. In recent years, deep learning methods have superseded traditional dictionary- and rule-based NER approaches. A high-quality dataset is essential to fully leverage recent deep learning advancements. While several gold-standard corpora for biomedical entities in abstracts exist, only a few are based on full-text research articles. The Europe PMC literature database routinely annotates Gene/Proteins, Diseases, and Organisms entities. To transition this pipeline from a dictionary-based to a machine learning-based approach, we have developed a human-annotated full-text corpus for these entities, comprising 300 full-text open-access research articles. Over 72,000 mentions of biomedical concepts have been identified within approximately 114,000 sentences. This article describes the corpus and details how to access and reuse this open community resource.


Assuntos
Mineração de Dados , Processamento de Linguagem Natural , Humanos , Mineração de Dados/métodos , Bases de Dados Factuais , Europa (Continente) , Aprendizado de Máquina
14.
Curr Protoc ; 3(3): e694, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36946755

RESUMO

In the field of life sciences there is a growing need for literature analysis tools that help scientists tackle information overload. Europe PubMed Central (Europe PMC), a partner of PubMed Central (PMC; National Library of Medicine, 2022), is an open access database of over 41 million life science publications and preprints, enriched with supporting data, reviews, protocols, and other relevant resources. Europe PMC is a trusted repository of choice for many life science funders (Europe PMC, 2022a), offering a suite of innovative search tools that allow users to search and evaluate the literature, including finding highly cited articles, preprints with community peer reviews, or papers referencing a proteomics dataset in the figure legend. In addition, Europe PMC utilizes text-mining to help researchers identify key terms and find data and evidence in the literature. First-time users often do not utilize the wealth of tools Europe PMC offers and can feel overwhelmed about how to perform the most effective search. This protocol, describing how to search and evaluate publications and preprints using Europe PMC, demonstrates how to carry out more efficient and effective literature searches using the tools provided by Europe PMC. This includes discovering the latest findings on a research topic, following research from a specific author, journal, or preprint server, exploring literature on a new method, expanding your reading list with relevant articles, as well as accessing and evaluating publications and preprints of interest. © 2023 EMBL-EBI. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Finding articles and preprints on a topic of interest Basic Protocol 2: Accessing an article Basic Protocol 3: Browsing the article Basic Protocol 4: Evaluating the article Basic Protocol 5: Refining search results Basic Protocol 6: Finding research by author Basic Protocol 7: Finding a specific article Basic Protocol 8: Finding information about a methodology Basic Protocol 9: Finding evidence of biological interactions, relations, and modifications Basic Protocol 10: Finding data behind a publication Basic Protocol 11: Expanding a reading list and building a bibliography Basic Protocol 12: Staying on top of the current literature.


Assuntos
Disciplinas das Ciências Biológicas , Mineração de Dados , PubMed , Europa (Continente) , Ferramenta de Busca
15.
Nat Comput Sci ; 3(6): 514-521, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38177425

RESUMO

The carbon footprint of scientific computing is substantial, but environmentally sustainable computational science (ESCS) is a nascent field with many opportunities to thrive. To realize the immense green opportunities and continued, yet sustainable, growth of computer science, we must take a coordinated approach to our current challenges, including greater awareness and transparency, improved estimation and wider reporting of environmental impacts. Here, we present a snapshot of where ESCS stands today and introduce the GREENER set of principles, as well as guidance for best practices moving forward.

16.
Gigascience ; 112022 08 11.
Artigo em Inglês | MEDLINE | ID: mdl-35950838

RESUMO

Metagenomics is a culture-independent method for studying the microbes inhabiting a particular environment. Comparing the composition of samples (functionally/taxonomically), either from a longitudinal study or cross-sectional studies, can provide clues into how the microbiota has adapted to the environment. However, a recurring challenge, especially when comparing results between independent studies, is that key metadata about the sample and molecular methods used to extract and sequence the genetic material are often missing from sequence records, making it difficult to account for confounding factors. Nevertheless, these missing metadata may be found in the narrative of publications describing the research. Here, we describe a machine learning framework that automatically extracts essential metadata for a wide range of metagenomics studies from the literature contained in Europe PMC. This framework has enabled the extraction of metadata from 114,099 publications in Europe PMC, including 19,900 publications describing metagenomics studies in European Nucleotide Archive (ENA) and MGnify. Using this framework, a new metagenomics annotations pipeline was developed and integrated into Europe PMC to regularly enrich up-to-date ENA and MGnify metagenomics studies with metadata extracted from research articles. These metadata are now available for researchers to explore and retrieve in the MGnify and Europe PMC websites, as well as Europe PMC annotations API.


Assuntos
Metadados , Metagenômica , Acesso à Informação , Estudos Transversais , Estudos Longitudinais , Aprendizado de Máquina , Metagenômica/métodos
17.
Trends Biochem Sci ; 29(12): 627-33, 2004 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-15544947

RESUMO

Sequence similarities among proteins can infer biological function and evolutionary relationships--a powerful approach for investigating new proteins and suggesting future experiments. The availability of public sequence databases and freely distributed tools for sequence analysis has meant that researchers from all over the world can use this approach. For the past 12 years, the Protein Sequence Motif column in TiBS has provided a platform for documenting interesting discoveries from sequence analyses. As the column comes to an end, we look at the published contributions over the years and reflect on sequence analysis through the beginning of the genomic era.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Homologia Estrutural de Proteína , Animais , Humanos
18.
Acta Crystallogr F Struct Biol Commun ; 75(Pt 11): 665-672, 2019 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-31702580

RESUMO

This work presents an annotation tool that automatically locates mentions of particular amino-acid residues in published papers and identifies the protein concerned. These matches can be provided in context or in a searchable format in order for researchers to better use the existing and future literature.


Assuntos
Anotação de Sequência Molecular , Proteínas/química , Publicações , Aminoácidos/química , Automação , Mutação/genética , Proteínas/genética , Software
19.
J Biomed Semantics ; 8(1): 20, 2017 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-28587637

RESUMO

BACKGROUND: We present the Europe PMC literature component of Open Targets - a target validation platform that integrates various evidence to aid drug target identification and validation. The component identifies target-disease associations in documents and ranks the documents based on their confidence from the Europe PMC literature database, by using rules utilising expert-provided heuristic information. The confidence score of a given document represents how valuable the document is in the scope of target validation for a given target-disease association by taking into account the credibility of the association based on the properties of the text. The component serves the platform regularly with the up-to-date data since December, 2015. RESULTS: Currently, there are a total number of 1168365 distinct target-disease associations text mined from >26 million PubMed abstracts and >1.2 million Open Access full text articles. Our comparative analyses on the current available evidence data in the platform revealed that 850179 of these associations are exclusively identified by literature mining. CONCLUSIONS: This component helps the platform's users by providing the most relevant literature hits for a given target and disease. The text mining evidence along with the other types of evidence can be explored visually through https://www.targetvalidation.org and all the evidence data is available for download in json format from https://www.targetvalidation.org/downloads/data .


Assuntos
Ontologias Biológicas , Terapia de Alvo Molecular , Mineração de Dados , Documentação , Publicações , Reprodutibilidade dos Testes
20.
Artigo em Inglês | MEDLINE | ID: mdl-28025348

RESUMO

Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system 'accuracy' remains a challenge and identify several additional common difficulties and potential research directions including (i) the 'scalability' issue due to the increasing need of mining information from millions of full-text articles, (ii) the 'interoperability' issue of integrating various text-mining systems into existing curation workflows and (iii) the 'reusability' issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators.


Assuntos
Pesquisa Biomédica , Curadoria de Dados/métodos , Mineração de Dados/métodos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa