Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 228
Filtrar
2.
Stud Health Technol Inform ; 270: 397-401, 2020 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-32570414

RESUMO

Next generation sequencing (NGS) technologies allow improved understanding of pathogens. In the upstream processing of generating genomic data, there is still a lack of process-oriented tools for managing corresponding meta data. In this paper, we provide a description of how a process-oriented software prototype was developed that allowed the capture and collation of metadata involved when doing NGS. Our question was: How to develop an interactive web application that supports the process-oriented management of genetic data independent of any sequencing technique?


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Genômica , Metadados
5.
Artigo em Alemão | MEDLINE | ID: mdl-32424556

RESUMO

The National Action Plan for People with Rare Diseases contains 52 concrete actions, including in the fields of care, research, diagnosis, and information management. With the aim of improving the quality and interoperability of national registries in the long term, action 28 proposed the establishment of a "Rare Diseases Registry" strategy group. The strategy group began its work in 2016. In this report, the group takes into account developments at the national and international level in order to develop recommendations for national initiatives.In addition to this, the group reports on consent and implementation as well as on the adaptation of a minimal dataset for use in rare disease registries and mapping the used data elements and schemata in a metadata repository. This position paper was created by the strategy group together with additional authors. The paper reached a consensus within the strategy group and can be seen as a concept paper of the Rare Diseases Registry strategy group.


Assuntos
Metadados , Doenças Raras , Confidencialidade , Alemanha , Humanos , Sistema de Registros
6.
PLoS One ; 15(3): e0228885, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32134940

RESUMO

A citation is deemed as a potential parameter to determine linkage between research articles. The parameter has extensively been employed to form multifarious academic aspects like calculating the impact factor of journals, h-Index of researchers, allocate different research grants, find the latest research trends, etc. The current state-of-the-art contends that all citations are not of equal importance. Based on this argument, the current trend in citation classification community categorizes citations into important and non-important reasons. The community has proposed different approaches to extract important citations such as citation count, context-based, metadata, and textual based approaches. The contemporary state-of-the-art in citation classification community ignores significantly potential features that can play a vital role in citation classification. This research presents a novel approach for binary citation classification by exploiting section-wise in-text citation frequencies, similarity score, and overall citation count-based features. The study also introduces machine learning algorithms based novel approach for assigning appropriate weights to the logical sections of research papers. The weights are allocated to the citations with respect to their sections. To perform the classification, we used three classification techniques, Support Vector Machine, Kernel Linear Regression, and Random Forest. The experiment was performed on two annotated benchmark datasets that contain 465 and 311 citation pairs of research articles respectively. The results revealed that the proposed approach attained an improved value of precision (i.e., 0.84 vs 0.72) from contemporary state-of-the-art approach.


Assuntos
Fator de Impacto de Revistas , Publicações Periódicas como Assunto/estatística & dados numéricos , Humanos , Modelos Lineares , Metadados , Máquina de Vetores de Suporte
7.
PLoS Genet ; 16(2): e1008576, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-32053607

RESUMO

Although Plasmodium vivax parasites are the predominant cause of malaria outside of sub-Saharan Africa, they not always prioritised by elimination programmes. P. vivax is resilient and poses challenges through its ability to re-emerge from dormancy in the human liver. With observed growing drug-resistance and the increasing reports of life-threatening infections, new tools to inform elimination efforts are needed. In order to halt transmission, we need to better understand the dynamics of transmission, the movement of parasites, and the reservoirs of infection in order to design targeted interventions. The use of molecular genetics and epidemiology for tracking and studying malaria parasite populations has been applied successfully in P. falciparum species and here we sought to develop a molecular genetic tool for P. vivax. By assembling the largest set of P. vivax whole genome sequences (n = 433) spanning 17 countries, and applying a machine learning approach, we created a 71 SNP barcode with high predictive ability to identify geographic origin (91.4%). Further, due to the inclusion of markers for within population variability, the barcode may also distinguish local transmission networks. By using P. vivax data from a low-transmission setting in Malaysia, we demonstrate the potential ability to infer outbreak events. By characterising the barcoding SNP genotypes in P. vivax DNA sourced from UK travellers (n = 132) to ten malaria endemic countries predominantly not used in the barcode construction, we correctly predicted the geographic region of infection origin. Overall, the 71 SNP barcode outperforms previously published genotyping methods and when rolled-out within new portable platforms, is likely to be an invaluable tool for informing targeted interventions towards elimination of this resilient human malaria.


Assuntos
Surtos de Doenças/prevenção & controle , Genoma de Protozoário/genética , Técnicas de Genotipagem/métodos , Malária Vivax/transmissão , Plasmodium vivax/genética , África Oriental , Ásia , Conjuntos de Dados como Assunto , Erradicação de Doenças/métodos , Marcadores Genéticos/genética , Genótipo , Geografia , Humanos , Malária Vivax/epidemiologia , Malária Vivax/parasitologia , Metadados , Repetições de Microssatélites/genética , Plasmodium vivax/isolamento & purificação , Polimorfismo de Nucleotídeo Único/genética , Valor Preditivo dos Testes , América do Sul , Doença Relacionada a Viagens , Reino Unido , Sequenciamento Completo do Genoma
8.
Database (Oxford) ; 20192020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31960040

RESUMO

Data and metadata interoperability between data storage systems is a critical component of the FAIR data principles. Programmatic and consistent means of reconciling metadata models between databases promote data exchange and thus increases its access to the scientific community. This process requires (i) metadata mapping between the models and (ii) software to perform the mapping. Here, we describe our efforts to map metadata associated with genome assemblies between the National Center for Biotechnology Information (NCBI) data resources and the Chado biological database schema. We present mappings for multiple NCBI data structures and introduce a Tripal software module, Tripal EUtils, to pull metadata from NCBI into a Tripal/Chado database. We discuss potential mapping challenges and solutions and provide suggestions for future development to further increase interoperability between these platforms. Database URL: https://github.com/NAL-i5K/tripal_eutils.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma , Metadados , Linguagens de Programação , Algoritmos , Animais , Genômica , Armazenamento e Recuperação da Informação , Invertebrados/genética , National Library of Medicine (U.S.) , Plantas/genética , Software , Estados Unidos
9.
Nucleic Acids Res ; 48(4): e23, 2020 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-31956905

RESUMO

The diverse and growing omics data in public domains provide researchers with tremendous opportunity to extract hidden, yet undiscovered, knowledge. However, the vast majority of archived data remain unused. Here, we present MetaOmGraph (MOG), a free, open-source, standalone software for exploratory analysis of massive datasets. Researchers, without coding, can interactively visualize and evaluate data in the context of its metadata, honing-in on groups of samples or genes based on attributes such as expression values, statistical associations, metadata terms and ontology annotations. Interaction with data is easy via interactive visualizations such as line charts, box plots, scatter plots, histograms and volcano plots. Statistical analyses include co-expression analysis, differential expression analysis and differential correlation analysis, with significance tests. Researchers can send data subsets to R for additional analyses. Multithreading and indexing enable efficient big data analysis. A researcher can create new MOG projects from any numerical data; or explore an existing MOG project. MOG projects, with history of explorations, can be saved and shared. We illustrate MOG by case studies of large curated datasets from human cancer RNA-Seq, where we identify novel putative biomarker genes in different tumors, and microarray and metabolomics data from Arabidopsis thaliana. MOG executable and code: http://metnetweb.gdcb.iastate.edu/ and https://github.com/urmi-21/MetaOmGraph/.


Assuntos
Big Data , Perfilação da Expressão Gênica/estatística & dados numéricos , Regulação da Expressão Gênica/genética , Software , Análise de Dados , Interpretação Estatística de Dados , Humanos , Metadados/estatística & dados numéricos
10.
Br J Radiol ; 93(1109): 20190574, 2020 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-31971816

RESUMO

Healthcare is increasingly and routinely generating large volumes of data from different sources, which are difficult to handle and integrate. Confidence in data can be established through the knowledge that the data are validated, well-curated and with minimal bias or errors. As the National Measurement Institute of the UK, the National Physical Laboratory (NPL) is running an interdisciplinary project on digital health data curation. The project addresses one of the key challenges of the UK's Measurement Strategy, to provide confidence in the intelligent and effective use of data. A workshop was organised by NPL in which important stakeholders from NHS, industry and academia outlined the current and future challenges in healthcare data curation. This paper summarises the findings of the workshop and outlines NPL's views on how a metrological approach to the curation of healthcare data sets could help solve some of the important and emerging challenges of utilising healthcare data.


Assuntos
Coleta de Dados/métodos , Informática Médica/métodos , Projetos de Pesquisa/normas , Coleta de Dados/normas , Difusão de Inovações , Humanos , Informática Médica/normas , Metadados/normas , Telemedicina/métodos , Telemedicina/normas , Reino Unido
11.
Handb Exp Pharmacol ; 257: 277-297, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31792682

RESUMO

While research data has become integral to the scholarly endeavour, a number of challenges hinder its development, management and dissemination. This chapter follows the life cycle of research data, by considering aspects ranging from storage and preservation to sharing and legal factors. While it provides a wide overview of the current ecosystem, it also pinpoints the elements comprising the modern research sharing practices such as metadata creation, the FAIR principles, identifiers, Creative Commons licencing and the various repository options. Furthermore, the chapter discusses the mandates and regulations that influence data sharing and the possible technological means of overcoming their complexity, such as blockchain systems.


Assuntos
Ecossistema , Armazenamento e Recuperação da Informação , Coleta de Dados , Disseminação de Informação , Metadados
12.
Handb Exp Pharmacol ; 257: 299-317, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31620915

RESUMO

Any given research claim can be made with a degree of confidence that a phenomenon is present, with an estimate of the precision of the observed effects and a prediction of the extent to which the findings might hold true under different experimental or real-world conditions. In some situations, the certainty and precision obtained from a single study are sufficient reliably to inform future research decisions. However, in other situations greater certainty is required. This might be the case where a substantial research investment is planned, a pivotal claim is to be made or the launch of a clinical trial programme is being considered. Under these circumstances, some form of summary of findings across studies may be helpful.Summary estimates can describe findings from exploratory (observational) or hypothesis testing experiments, but importantly, the creation of such summaries is, in itself, observational rather than experimental research. The process is therefore particularly at risk from selective identification of literature to be included, and this can be addressed using systematic search strategies and pre-specified criteria for inclusion and exclusion against which possible contributing data will be assessed. This characterises a systematic review (in contrast to nonsystematic or narrative reviews). In meta-analysis, there is an attempt to provide a quantitative summary of such research findings.


Assuntos
Metadados , Humanos
13.
Methods Inf Med ; 58(S 02): e72-e79, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31853911

RESUMO

BACKGROUND: Secondary use of routine medical data relies on a shared understanding of given information. This understanding is achieved through metadata and their interconnections, which can be stored in metadata repositories (MDRs). The necessity of an MDR is well understood, but the local work on metadata is a time-consuming and challenging process for domain experts. OBJECTIVE: To support the identification, collection, and provision of metadata in a predefined structured manner to foster consolidation. A particular focus is placed on user acceptance. METHODS: We propose a software pipeline MDRBridge as a practical intermediary for metadata capture and processing, based on MDRSheet, an ISO 11179-3 compliant template using popular spreadsheet software. It serves as a practical mediator for metadata acquisition and processing in a broader pipeline. Due to the different origins of the metadata, both manual entry and automatic extractions from application systems are supported. To enable the export of collected metadata into external MDRs, a mapping of ISO 11179 to Clinical Data Interchange Standards Consortium (CDISC) Operational Data Model (ODM) was developed. RESULTS: MDRSheet is embedded in the processing pipeline MDRBridge and delivers metadata in the CDISC ODM format for further use in MDRs. This approach is used to interactively unify core datasets, import existing standard datasets, and automatically extract all defined data elements from source systems. The involvement of clinical domain experts improved significantly due to minimal changes within their usual work routine. CONCLUSION: A high degree of acceptance was achieved by adapting the working methods of clinical domain experts. The designed process is capable of transforming all relevant data elements according to the ISO 11179-3 format. MDRSheet is used as an intermediate format to present the information at a glance and to allow editing or supplementing by domain experts.


Assuntos
Análise de Dados , Bases de Dados como Assunto , Informática Médica , Metadados , Interface Usuário-Computador
14.
BMC Bioinformatics ; 20(1): 542, 2019 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-31675914

RESUMO

BACKGROUND: In biological experiments, comprehensive experimental metadata tracking - which comprises experiment, reagent, and protocol annotation with controlled vocabulary from established ontologies - remains a challenge, especially when the experiment involves multiple laboratory scientists who execute different steps of the protocol. Here we describe Annot, a novel web application designed to provide a flexible solution for this task. RESULTS: Annot enforces the use of controlled vocabulary for sample and reagent annotation while enabling robust investigation, study, and protocol tracking. The cornerstone of Annot's implementation is a json syntax-compatible file format, which can capture detailed metadata for all aspects of complex biological experiments. Data stored in this json file format can easily be ported into spreadsheet or data frame files that can be loaded into R ( https://www.r-project.org/ ) or Pandas, Python's data analysis library ( https://pandas.pydata.org/ ). Annot is implemented in Python3 and utilizes the Django web framework, Postgresql, Nginx, and Debian. It is deployed via Docker and supports all major browsers. CONCLUSIONS: Annot offers a robust solution to annotate samples, reagents, and experimental protocols for established assays where multiple laboratory scientists are involved. Further, it provides a framework to store and retrieve metadata for data analysis and integration, and therefore ensures that data generated in different experiments can be integrated and jointly analyzed. This type of solution to metadata tracking can enhance the utility of large-scale datasets, which we demonstrate here with a large-scale microenvironment microarray study.


Assuntos
Biologia Computacional/métodos , Curadoria de Dados/métodos , Indicadores e Reagentes/provisão & distribução , Metadados , Bancos de Espécimes Biológicos/estatística & dados numéricos , Software , Vocabulário Controlado
15.
Gigascience ; 8(10)2019 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-31648301

RESUMO

Increasingly sophisticated experiments, coupled with large-scale computational models, have the potential to systematically test biological hypotheses to drive our understanding of multicellular systems. In this short review, we explore key challenges that must be overcome to achieve robust, repeatable data-driven multicellular systems biology. If these challenges can be solved, we can grow beyond the current state of isolated tools and datasets to a community-driven ecosystem of interoperable data, software utilities, and computational modeling platforms. Progress is within our grasp, but it will take community (and financial) commitment.


Assuntos
Biologia de Sistemas/métodos , Big Data , Metadados
16.
PLoS One ; 14(10): e0223984, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31626635

RESUMO

In the past scientists reported summaries of their findings; they did not provide their original data collections. Many stakeholders (e.g., funding agencies) are now requesting that such data be made publicly available. This mandate is being adopted to facilitate further discovery, and to mitigate waste and deficits in the research process. At the same time, the necessary infrastructure for data curation (e.g., repositories) has been evolving. The current target is to make research products FAIR (Findable, Accessible, Interoperable, Reusable), resulting in data that are curated and archived to be both human and machine compatible. However, most scientists have little training in data curation. Specifically, they are ill-equipped to annotate their data collections at a level that facilitates discoverability, aggregation, and broad reuse in a context separate from their creation or sub-field. To circumvent these deficits data architects may collaborate with scientists to transform and curate data. This paper's example of a data collection describes the electrical properties of outer hair cells isolated from the mammalian cochlea. The data is expressed with a variant of The Ontology for Biomedical Investigations (OBI), mirrored to provide the metadata and nested data architecture used within the Hierarchical Data Format version 5 (HDF5) format. Each digital specimen is displayed in a tree configuration (like directories in a computer) and consists of six main branches based on the ontology classes. The data collections, scripts, and ontological OWL file (OBI based Inner Ear Electrophysiology (OBI_IEE)) are deposited in three repositories. We discuss the impediments to producing such data collections for public use, and the tools and processes required for effective implementation. This work illustrates the impact that small collaborations can have on the curation of our publicly-funded collections, and is particularly salient for fields where data is sparse, throughput is low, and sacrifice of animals is required for discovery.


Assuntos
Orelha Interna/fisiologia , Biologia Computacional/métodos , Curadoria de Dados , Bases de Dados Factuais , Fenômenos Eletrofisiológicos , Humanos , Metadados
17.
Stud Health Technol Inform ; 267: 66-73, 2019 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-31483256

RESUMO

Data integration is the problem of combining data residing at different sources and providing the user with a unified view of these data. In medical informatics, such a unified view enables retrospective analyses based on more facts and prospective recruitment of more patients than any single data collection by itself. The technical part of data integration is based on rules interpreted by software. These rules define how to perform the translation of source database schemata into the target database schema. Translation rules are formulated by data managers who usually do not have the knowledge about meaning and acquisition methods of the data they handle. The professionals (data providers) collecting the source data who have the respective knowledge again usually have no sufficient technical background. Since data providers are neither able to formulate the transformation rules themselves nor able to validate them, the whole process is fault-prone. Additionally, in continuous development and maintenance of (meta-) data repositories, data structures underlie changes, which may lead to outdated transformation rules. We did not find any technical solution, which enables data providers to formulate transformation rules themselves or which provides an understandable reflection of given rules. Our approach is to enable data providers understand the rules regarding their own data by presenting rules and available context visually. Context information is fetched from a metadata repository. In this paper, we propose a software tool that builds on existing data integration infrastructures. The tool provides a visually supported validation routine for data integration rules. In a first step towards its evaluation, we implement the tool into the DZL data integration process and verify the correct presentation of transformation rules.


Assuntos
Metadados , Semântica , Software , Bases de Dados Factuais , Humanos , Estudos Prospectivos , Estudos Retrospectivos
18.
Stud Health Technol Inform ; 267: 74-80, 2019 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-31483257

RESUMO

The utilisation of metadata repositories increasingly promotes secondary use of routinely collected data. However, this has not yet solved the problem of data exchange across organisational boundaries. The local description of a metadata set must also be exchangeable for flawless data exchange. In previous work, a metadata exchange language QL4MDR was developed. This work aimed to examine the applicability of this exchange language. For this purpose, existing MDR implementations were identified and systematically inspected and roughly divided into two categories to distinguish between data integration and query integration. It has been shown that all the implementations can be adapted to QL4MDR. The integration of metadata is an important first step; it enables the exchange of information, which is so urgently needed for the further processing of instance data, from the metadata mappings to the transformation rules.


Assuntos
Metadados
19.
Stud Health Technol Inform ; 267: 86-92, 2019 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-31483259

RESUMO

Interoperability is a growing demand in healthcare, caused by heterogeneous sources, which aggravate information transfer. The interoperability issues can be addressed by metadata repositories. These support to ensure syntactical interoperability, like compatible data formats or value ranges, however especially semantic interoperability is still challenging. Semantic annotation through standardized terminologies and classifications enables to foster semantic interoperability. This work aims to interconnect Samply.MDR and Portal of Medical Data Model (MDM-Portal) to allow facilitated semantic annotation with UMLS. Therefore, Samply.MDR was extended to store semantic information. While creating a data element, a request to MDM is send, which results in possible UMLS codes. The user can now adopt the most suitable code and select a link type between the code and the element itself. A successful enrichment of data elements with UMLS codes was shown by interconnecting Samply.MDR and MDM-Portal.


Assuntos
Metadados , Semântica
20.
Stud Health Technol Inform ; 267: 230-237, 2019 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-31483277

RESUMO

The German Center for Lung Research (DZL) is a research network with the aim of researching respiratory diseases. In order to enable consortium-wide retrospective research and prospective patient recruitment, we perform data integration into a central data warehouse. The enhancements of the underlying ontology is an ongoing process for which we developed the Collaborative Metadata Repository (CoMetaR) tool. Its technical infrastructure is based on the Resource Description Framework (RDF) for ontology representation and the distributed version control system Git for storage and versioning. Ontology development involves a considerable amount of data curation. Data provenance improves its feasibility and quality. Especially in collaborative metadata development, a comprehensive annotation about "who contributed what, when and why" is essential. Although RDF and Git versioning repositories are commonly used, no existing solution captures metadata provenance information in sufficient detail. We propose an enhanced composition of standardized RDF statements for detailed provenance representation. Additionally, we developed an algorithm that extracts and translates provenance data from the repository into the proposed RDF statements.


Assuntos
Ontologias Biológicas , Data Warehousing , Humanos , Metadados , Estudos Prospectivos , Estudos Retrospectivos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA