Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
Nucleic Acids Res ; 52(D1): D10-D17, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-38015445

RESUMEN

The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the latest developments in the services provided by EMBL-EBI data resources to scientific communities globally. These developments aim to ensure EMBL-EBI resources meet the current and future needs of these scientific communities, accelerating the impact of open biological data for all.


Asunto(s)
Academias e Institutos , Biología Computacional , Biología Computacional/organización & administración , Biología Computacional/tendencias , Academias e Institutos/organización & administración , Academias e Institutos/tendencias , Bases de Datos de Ácidos Nucleicos , Europa (Continente)
2.
Sci Data ; 10(1): 722, 2023 10 19.
Artículo en Inglés | MEDLINE | ID: mdl-37857688

RESUMEN

Named entity recognition (NER) is a widely used text-mining and natural language processing (NLP) subtask. In recent years, deep learning methods have superseded traditional dictionary- and rule-based NER approaches. A high-quality dataset is essential to fully leverage recent deep learning advancements. While several gold-standard corpora for biomedical entities in abstracts exist, only a few are based on full-text research articles. The Europe PMC literature database routinely annotates Gene/Proteins, Diseases, and Organisms entities. To transition this pipeline from a dictionary-based to a machine learning-based approach, we have developed a human-annotated full-text corpus for these entities, comprising 300 full-text open-access research articles. Over 72,000 mentions of biomedical concepts have been identified within approximately 114,000 sentences. This article describes the corpus and details how to access and reuse this open community resource.


Asunto(s)
Minería de Datos , Procesamiento de Lenguaje Natural , Humanos , Minería de Datos/métodos , Bases de Datos Factuales , Europa (Continente) , Aprendizaje Automático
3.
Curr Protoc ; 3(3): e694, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-36946755

RESUMEN

In the field of life sciences there is a growing need for literature analysis tools that help scientists tackle information overload. Europe PubMed Central (Europe PMC), a partner of PubMed Central (PMC; National Library of Medicine, 2022), is an open access database of over 41 million life science publications and preprints, enriched with supporting data, reviews, protocols, and other relevant resources. Europe PMC is a trusted repository of choice for many life science funders (Europe PMC, 2022a), offering a suite of innovative search tools that allow users to search and evaluate the literature, including finding highly cited articles, preprints with community peer reviews, or papers referencing a proteomics dataset in the figure legend. In addition, Europe PMC utilizes text-mining to help researchers identify key terms and find data and evidence in the literature. First-time users often do not utilize the wealth of tools Europe PMC offers and can feel overwhelmed about how to perform the most effective search. This protocol, describing how to search and evaluate publications and preprints using Europe PMC, demonstrates how to carry out more efficient and effective literature searches using the tools provided by Europe PMC. This includes discovering the latest findings on a research topic, following research from a specific author, journal, or preprint server, exploring literature on a new method, expanding your reading list with relevant articles, as well as accessing and evaluating publications and preprints of interest. © 2023 EMBL-EBI. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Finding articles and preprints on a topic of interest Basic Protocol 2: Accessing an article Basic Protocol 3: Browsing the article Basic Protocol 4: Evaluating the article Basic Protocol 5: Refining search results Basic Protocol 6: Finding research by author Basic Protocol 7: Finding a specific article Basic Protocol 8: Finding information about a methodology Basic Protocol 9: Finding evidence of biological interactions, relations, and modifications Basic Protocol 10: Finding data behind a publication Basic Protocol 11: Expanding a reading list and building a bibliography Basic Protocol 12: Staying on top of the current literature.


Asunto(s)
Disciplinas de las Ciencias Biológicas , Minería de Datos , PubMed , Europa (Continente) , Motor de Búsqueda
4.
Nucleic Acids Res ; 51(D1): D9-D17, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36477213

RESUMEN

The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the status of services that EMBL-EBI data resources provide to scientific communities globally. The scale, openness, rich metadata and extensive curation of EMBL-EBI added-value databases makes them particularly well-suited as training sets for deep learning, machine learning and artificial intelligence applications, a selection of which are described here. The data resources at EMBL-EBI can catalyse such developments because they offer sustainable, high-quality data, collected in some cases over decades and made openly availability to any researcher, globally. Our aim is for EMBL-EBI data resources to keep providing the foundations for tools and research insights that transform fields across the life sciences.


Asunto(s)
Inteligencia Artificial , Biología Computacional , Manejo de Datos , Bases de Datos Factuales , Genoma , Internet
5.
Nat Comput Sci ; 3(6): 514-521, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38177425

RESUMEN

The carbon footprint of scientific computing is substantial, but environmentally sustainable computational science (ESCS) is a nascent field with many opportunities to thrive. To realize the immense green opportunities and continued, yet sustainable, growth of computer science, we must take a coordinated approach to our current challenges, including greater awareness and transparency, improved estimation and wider reporting of environmental impacts. Here, we present a snapshot of where ESCS stands today and introduce the GREENER set of principles, as well as guidance for best practices moving forward.

6.
Gigascience ; 112022 08 11.
Artículo en Inglés | MEDLINE | ID: mdl-35950838

RESUMEN

Metagenomics is a culture-independent method for studying the microbes inhabiting a particular environment. Comparing the composition of samples (functionally/taxonomically), either from a longitudinal study or cross-sectional studies, can provide clues into how the microbiota has adapted to the environment. However, a recurring challenge, especially when comparing results between independent studies, is that key metadata about the sample and molecular methods used to extract and sequence the genetic material are often missing from sequence records, making it difficult to account for confounding factors. Nevertheless, these missing metadata may be found in the narrative of publications describing the research. Here, we describe a machine learning framework that automatically extracts essential metadata for a wide range of metagenomics studies from the literature contained in Europe PMC. This framework has enabled the extraction of metadata from 114,099 publications in Europe PMC, including 19,900 publications describing metagenomics studies in European Nucleotide Archive (ENA) and MGnify. Using this framework, a new metagenomics annotations pipeline was developed and integrated into Europe PMC to regularly enrich up-to-date ENA and MGnify metagenomics studies with metadata extracted from research articles. These metadata are now available for researchers to explore and retrieve in the MGnify and Europe PMC websites, as well as Europe PMC annotations API.


Asunto(s)
Metadatos , Metagenómica , Acceso a la Información , Estudios Transversales , Estudios Longitudinales , Aprendizaje Automático , Metagenómica/métodos
7.
Nucleic Acids Res ; 50(D1): D11-D19, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34850134

RESUMEN

The European Bioinformatics Institute (EMBL-EBI) maintains a comprehensive range of freely available and up-to-date molecular data resources, which includes over 40 resources covering every major data type in the life sciences. This year's service update for EMBL-EBI includes new resources, PGS Catalog and AlphaFold DB, and updates on existing resources, including the COVID-19 Data Platform, trRosetta and RoseTTAfold models introduced in Pfam and InterPro, and the launch of Genome Integrations with Function and Sequence by UniProt and Ensembl. Furthermore, we highlight projects through which EMBL-EBI has contributed to the development of community-driven data standards and guidelines, including the Recommended Metadata for Biological Images (REMBI), and the BioModels Reproducibility Scorecard. Training is one of EMBL-EBI's core missions and a key component of the provision of bioinformatics services to users: this year's update includes many of the improvements that have been developed to EMBL-EBI's online training offering.


Asunto(s)
Biología Computacional/educación , Biología Computacional/métodos , Bases de Datos Factuales , Academias e Institutos , Inteligencia Artificial , COVID-19 , Bases de Datos Factuales/economía , Bases de Datos Factuales/estadística & datos numéricos , Bases de Datos Farmacéuticas , Bases de Datos de Proteínas , Europa (Continente) , Genoma Humano , Humanos , Almacenamiento y Recuperación de la Información , ARN no Traducido/genética , SARS-CoV-2/genética
8.
Nucleic Acids Res ; 49(D1): D29-D37, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33245775

RESUMEN

The European Bioinformatics Institute (EMBL-EBI; https://www.ebi.ac.uk/) provides freely available data and bioinformatics services to the scientific community, alongside its research activity and training provision. The 2020 COVID-19 pandemic has brought to the forefront a need for the scientific community to work even more cooperatively to effectively tackle a global health crisis. EMBL-EBI has been able to build on its position to contribute to the fight against COVID-19 in a number of ways. Firstly, EMBL-EBI has used its infrastructure, expertise and network of international collaborations to help build the European COVID-19 Data Platform (https://www.covid19dataportal.org/), which brings together COVID-19 biomolecular data and connects it to researchers, clinicians and public health professionals. By September 2020, the COVID-19 Data Platform has integrated in excess of 170 000 COVID-19 biomolecular data and literature records, collected through a number of EMBL-EBI resources. Secondly, EMBL-EBI has strived to continue its support of the life science communities through the crisis, with updated Training provision and improved service provision throughout its resources. The COVID-19 pandemic has highlighted the importance of EMBL-EBI's core principles, including international cooperation, resource sharing and central data brokering, and has further empowered scientific cooperation.


Asunto(s)
COVID-19/prevención & control , Biología Computacional/estadística & datos numéricos , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Almacenamiento y Recuperación de la Información/métodos , SARS-CoV-2/genética , Proteínas Virales/genética , COVID-19/epidemiología , COVID-19/virología , Biología Computacional/métodos , Biología Computacional/organización & administración , Bases de Datos de Ácidos Nucleicos/organización & administración , Salud Global , Humanos , Almacenamiento y Recuperación de la Información/estadística & datos numéricos , Internet , Pandemias , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiología , Proteínas Virales/metabolismo
9.
Nucleic Acids Res ; 49(D1): D1507-D1514, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33180112

RESUMEN

Europe PMC (https://europepmc.org) is a database of research articles, including peer reviewed full text articles and abstracts, and preprints - all freely available for use via website, APIs and bulk download. This article outlines new developments since 2017 where work has focussed on three key areas: (i) Europe PMC has added to its core content to include life science preprint abstracts and a special collection of full text of COVID-19-related preprints. Europe PMC is unique as an aggregator of biomedical preprints alongside peer-reviewed articles, with over 180 000 preprints available to search. (ii) Europe PMC has significantly expanded its links to content related to the publications, such as links to Unpaywall, providing wider access to full text, preprint peer-review platforms, all major curated data resources in the life sciences, and experimental protocols. The redesigned Europe PMC website features the PubMed abstract and corresponding PMC full text merged into one article page; there is more evident and user-friendly navigation within articles and to related content, plus a figure browse feature. (iii) The expanded annotations platform offers ∼1.3 billion text mined biological terms and concepts sourced from 10 providers and over 40 global data resources.


Asunto(s)
Disciplinas de las Ciencias Biológicas/estadística & datos numéricos , COVID-19/prevención & control , Curaduría de Datos/estadística & datos numéricos , Minería de Datos/estadística & datos numéricos , Bases de Datos Factuales/estadística & datos numéricos , PubMed , SARS-CoV-2/aislamiento & purificación , Disciplinas de las Ciencias Biológicas/métodos , Investigación Biomédica/métodos , Investigación Biomédica/estadística & datos numéricos , COVID-19/epidemiología , COVID-19/virología , Curaduría de Datos/métodos , Minería de Datos/métodos , Epidemias , Europa (Continente) , Humanos , Internet , SARS-CoV-2/fisiología
11.
Acta Crystallogr F Struct Biol Commun ; 75(Pt 11): 665-672, 2019 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-31702580

RESUMEN

This work presents an annotation tool that automatically locates mentions of particular amino-acid residues in published papers and identifies the protein concerned. These matches can be provided in context or in a searchable format in order for researchers to better use the existing and future literature.


Asunto(s)
Anotación de Secuencia Molecular , Proteínas/química , Publicaciones , Aminoácidos/química , Automatización , Mutación/genética , Proteínas/genética , Programas Informáticos
12.
Nucleic Acids Res ; 46(D1): D1266-D1270, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29069414

RESUMEN

BioStudies (www.ebi.ac.uk/biostudies) is a new public database that organizes data from biological studies. Typically, but not exclusively, a study is associated with a publication. BioStudies offers a simple way to describe the study structure, and provides flexible data deposition tools and data access interfaces. The actual data can be stored either in BioStudies or remotely, or both. BioStudies imports supplementary data from Europe PMC, and is a resource for authors and publishers for packaging data during the manuscript preparation process. It also can support data management needs of collaborative projects. The growth in multiomics experiments and other multi-faceted approaches to life sciences research mean that studies result in a diversity of data outputs in multiple locations. BioStudies presents a solution to ensuring that all these data and the associated publication(s) can be found coherently in the longer term.


Asunto(s)
Disciplinas de las Ciencias Biológicas , Bases de Datos Factuales , Animales , Humanos , Internet , Programas Informáticos
13.
Nucleic Acids Res ; 46(D1): D1254-D1260, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29161421

RESUMEN

Europe PMC (https://europepmc.org) is a comprehensive resource of biomedical research publications that offers advanced tools for search, retrieval, and interaction with the scientific literature. This article outlines new developments since 2014. In addition to delivering the core database and services, Europe PMC focuses on three areas of development: individual user services, data integration, and infrastructure to support text and data mining. Europe PMC now provides user accounts to save search queries and claim publications to ORCIDs, as well as open access profiles for authors based on public ORCID records. We continue to foster connections between scientific data and literature in a number of ways. All the data behind the paper - whether in structured archives, generic archives or as supplemental files - are now available via links to the BioStudies database. Text-mined biological concepts, including database accession numbers and data DOIs, are highlighted in the text and linked to the appropriate data resources. The SciLite community annotation platform accepts text-mining results from various contributors and overlays them on research articles as licence allows. In addition, text miners and developers can access all open content via APIs or via the FTP site.


Asunto(s)
Investigación Biomédica , Bases de Datos Bibliográficas , Minería de Datos , Internet , Publicaciones Seriadas , Interfaz Usuario-Computador
14.
J Biomed Semantics ; 8(1): 20, 2017 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-28587637

RESUMEN

BACKGROUND: We present the Europe PMC literature component of Open Targets - a target validation platform that integrates various evidence to aid drug target identification and validation. The component identifies target-disease associations in documents and ranks the documents based on their confidence from the Europe PMC literature database, by using rules utilising expert-provided heuristic information. The confidence score of a given document represents how valuable the document is in the scope of target validation for a given target-disease association by taking into account the credibility of the association based on the properties of the text. The component serves the platform regularly with the up-to-date data since December, 2015. RESULTS: Currently, there are a total number of 1168365 distinct target-disease associations text mined from >26 million PubMed abstracts and >1.2 million Open Access full text articles. Our comparative analyses on the current available evidence data in the platform revealed that 850179 of these associations are exclusively identified by literature mining. CONCLUSIONS: This component helps the platform's users by providing the most relevant literature hits for a given target and disease. The text mining evidence along with the other types of evidence can be explored visually through https://www.targetvalidation.org and all the evidence data is available for download in json format from https://www.targetvalidation.org/downloads/data .


Asunto(s)
Ontologías Biológicas , Terapia Molecular Dirigida , Minería de Datos , Documentación , Publicaciones , Reproducibilidad de los Resultados
15.
PLoS Biol ; 15(6): e2001414, 2017 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-28662064

RESUMEN

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.


Asunto(s)
Disciplinas de las Ciencias Biológicas/métodos , Biología Computacional/métodos , Minería de Datos/métodos , Diseño de Software , Programas Informáticos , Disciplinas de las Ciencias Biológicas/estadística & datos numéricos , Disciplinas de las Ciencias Biológicas/tendencias , Biología Computacional/tendencias , Minería de Datos/estadística & datos numéricos , Minería de Datos/tendencias , Bases de Datos Factuales/estadística & datos numéricos , Bases de Datos Factuales/tendencias , Predicción , Humanos , Internet
16.
Nucleic Acids Res ; 45(D1): D985-D994, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899665

RESUMEN

We have designed and developed a data integration and visualization platform that provides evidence about the association of known and potential drug targets with diseases. The platform is designed to support identification and prioritization of biological targets for follow-up. Each drug target is linked to a disease using integrated genome-wide data from a broad range of data sources. The platform provides either a target-centric workflow to identify diseases that may be associated with a specific target, or a disease-centric workflow to identify targets that may be associated with a specific disease. Users can easily transition between these target- and disease-centric workflows. The Open Targets Validation Platform is accessible at https://www.targetvalidation.org.


Asunto(s)
Biología Computacional/métodos , Terapia Molecular Dirigida , Motor de Búsqueda , Programas Informáticos , Bases de Datos Factuales , Humanos , Terapia Molecular Dirigida/métodos , Reproducibilidad de los Resultados , Navegador Web , Flujo de Trabajo
17.
Artículo en Inglés | MEDLINE | ID: mdl-28025348

RESUMEN

Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system 'accuracy' remains a challenge and identify several additional common difficulties and potential research directions including (i) the 'scalability' issue due to the increasing need of mining information from millions of full-text articles, (ii) the 'interoperability' issue of integrating various text-mining systems into existing curation workflows and (iii) the 'reusability' issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators.


Asunto(s)
Investigación Biomédica , Curaduría de Datos/métodos , Minería de Datos/métodos
18.
Artículo en Inglés | MEDLINE | ID: mdl-27589961

RESUMEN

Fully automated text mining (TM) systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These systems are not meant to replace biocurators, but instead to assist them in one or more literature curation steps. To do so, the user interface is an important aspect that needs to be considered for tool adoption. The BioCreative Interactive task (IAT) is a track designed for exploring user-system interactions, promoting development of useful TM tools, and providing a communication channel between the biocuration and the TM communities. In BioCreative V, the IAT track followed a format similar to previous interactive tracks, where the utility and usability of TM tools, as well as the generation of use cases, have been the focal points. The proposed curation tasks are user-centric and formally evaluated by biocurators. In BioCreative V IAT, seven TM systems and 43 biocurators participated. Two levels of user participation were offered to broaden curator involvement and obtain more feedback on usability aspects. The full level participation involved training on the system, curation of a set of documents with and without TM assistance, tracking of time-on-task, and completion of a user survey. The partial level participation was designed to focus on usability aspects of the interface and not the performance per se In this case, biocurators navigated the system by performing pre-designed tasks and then were asked whether they were able to achieve the task and the level of difficulty in completing the task. In this manuscript, we describe the development of the interactive task, from planning to execution and discuss major findings for the systems tested.Database URL: http://www.biocreative.org.


Asunto(s)
Curaduría de Datos/métodos , Minería de Datos/métodos , Procesamiento Automatizado de Datos/métodos
19.
F1000Res ; 52016.
Artículo en Inglés | MEDLINE | ID: mdl-27092246

RESUMEN

Data from open access biomolecular data resources, such as the European Nucleotide Archive and the Protein Data Bank are extensively reused within life science research for comparative studies, method development and to derive new scientific insights. Indicators that estimate the extent and utility of such secondary use of research data need to reflect this complex and highly variable data usage. By linking open access scientific literature, via Europe PubMedCentral, to the metadata in biological data resources we separate data citations associated with a deposition statement from citations that capture the subsequent, long-term, reuse of data in academia and industry.  We extend this analysis to begin to investigate citations of biomolecular resources in patent documents. We find citations in more than 8,000 patents from 2014, demonstrating substantial use and an important role for data resources in defining biological concepts in granted patents to both academic and industrial innovators. Combined together our results indicate that the citation patterns in biomedical literature and patents vary, not only due to citation practice but also according to the data resource cited. The results guard against the use of simple metrics such as citation counts and show that indicators of data use must not only take into account citations within the biomedical literature but also include reuse of data in industry and other parts of society by including patents and other scientific and technical documents such as guidelines, reports and grant applications.

20.
Wellcome Open Res ; 1: 25, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-28948232

RESUMEN

The tremendous growth in biological data has resulted in an increase in the number of research papers being published. This presents a great challenge for scientists in searching and assimilating facts described in those papers. Particularly, biological databases depend on curators to add highly precise and useful information that are usually extracted by reading research articles. Therefore, there is an urgent need to find ways to improve linking literature to the underlying data, thereby minimising the effort in browsing content and identifying key biological concepts.   As part of the development of Europe PMC, we have developed a new platform, SciLite, which integrates text-mined annotations from different sources and overlays those outputs on research articles. The aim is to aid researchers and curators using Europe PMC in finding key concepts more easily and provide links to related resources or tools, bridging the gap between literature and biological data.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA