Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Sci Data ; 11(1): 503, 2024 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-38755173

RESUMO

Nanomaterials hold great promise for improving our society, and it is crucial to understand their effects on biological systems in order to enhance their properties and ensure their safety. However, the lack of consistency in experimental reporting, the absence of universally accepted machine-readable metadata standards, and the challenge of combining such standards hamper the reusability of previously produced data for risk assessment. Fortunately, the research community has responded to these challenges by developing minimum reporting standards that address several of these issues. By converting twelve published minimum reporting standards into a machine-readable representation using FAIR maturity indicators, we have created a machine-friendly approach to annotate and assess datasets' reusability according to those standards. Furthermore, our NanoSafety Data Reusability Assessment (NSDRA) framework includes a metadata generator web application that can be integrated into experimental data management, and a new web application that can summarize the reusability of nanosafety datasets for one or more subsets of maturity indicators, tailored to specific computational risk assessment use cases. This approach enhances the transparency, communication, and reusability of experimental data and metadata. With this improved FAIR approach, we can facilitate the reuse of nanosafety research for exploration, toxicity prediction, and regulation, thereby advancing the field and benefiting society as a whole.


Assuntos
Nanoestruturas , Metadados , Nanoestruturas/toxicidade , Medição de Risco
3.
BMC Med Inform Decis Mak ; 24(1): 27, 2024 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-38291386

RESUMO

BACKGROUND: Synthetic data is an emerging approach for addressing legal and regulatory concerns in biomedical research that deals with personal and clinical data, whether as a single tool or through its combination with other privacy enhancing technologies. Generating uncompromised synthetic data could significantly benefit external researchers performing secondary analyses by providing unlimited access to information while fulfilling pertinent regulations. However, the original data to be synthesized (e.g., data acquired in Living Labs) may consist of subjects' metadata (static) and a longitudinal component (set of time-dependent measurements), making it challenging to produce coherent synthetic counterparts. METHODS: Three synthetic time series generation approaches were defined and compared in this work: only generating the metadata and coupling it with the real time series from the original data (A1), generating both metadata and time series separately to join them afterwards (A2), and jointly generating both metadata and time series (A3). The comparative assessment of the three approaches was carried out using two different synthetic data generation models: the Wasserstein GAN with Gradient Penalty (WGAN-GP) and the DöppelGANger (DGAN). The experiments were performed with three different healthcare-related longitudinal datasets: Treadmill Maximal Effort Test (TMET) measurements from the University of Malaga (1), a hypotension subset derived from the MIMIC-III v1.4 database (2), and a lifelogging dataset named PMData (3). RESULTS: Three pivotal dimensions were assessed on the generated synthetic data: resemblance to the original data (1), utility (2), and privacy level (3). The optimal approach fluctuates based on the assessed dimension and metric. CONCLUSION: The initial characteristics of the datasets to be synthesized play a crucial role in determining the best approach. Coupling synthetic metadata with real time series (A1), as well as jointly generating synthetic time series and metadata (A3), are both competitive methods, while separately generating time series and metadata (A2) appears to perform more poorly overall.


Assuntos
Metadados , Privacidade , Humanos , Fatores de Tempo , Bases de Dados Factuais
4.
Sci Data ; 10(1): 633, 2023 09 18.
Artigo em Inglês | MEDLINE | ID: mdl-37723189

RESUMO

The field of human action recognition has made great strides in recent years, much helped by the availability of a wide variety of datasets that use Kinect to record human movement. Conversely, progress towards the use of Kinect in clinical practice has been hampered by the lack of appropriate data. In particular, datasets that contain clinically significant movements and appropriate metadata. This paper proposes a dataset to address this issue, namely KINECAL. It contains the recordings of 90 individuals carrying out 11 movements, commonly used in the clinical assessment of balance. The dataset contains relevant metadata, including clinical labelling, falls history labelling and postural sway metrics. KINECAL should be of interest to researchers interested in the clinical use of motion capture and motion analysis.


Assuntos
Movimento , Humanos , Benchmarking , Metadados , Movimento (Física) , Medição de Risco , Acidentes por Quedas
5.
Metabolomics ; 18(12): 97, 2022 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-36436113

RESUMO

INTRODUCTION: The structural identification of metabolites represents one of the current bottlenecks in non-targeted liquid chromatography-mass spectrometry (LC-MS) based metabolomics. The Metabolomics Standard Initiative has developed a multilevel system to report confidence in metabolite identification, which involves the use of MS, MS/MS and orthogonal data. Limitations due to similar or same fragmentation pattern (e.g. isomeric compounds) can be overcome by the additional orthogonal information of the retention time (RT), since it is a system property that is different for each chromatographic setup. OBJECTIVES: In contrast to MS data, sharing of RT data is not as widespread. The quality of data and its (re-)useability depend very much on the quality of the metadata. We aimed to evaluate the coverage and quality of this metadata from public metabolomics repositories. METHODS: We acquired an overview on the current reporting of chromatographic separation conditions. For this purpose, we defined the following information as important details that have to be provided: column name and dimension, flow rate, temperature, composition of eluents and gradient. RESULTS: We found that 70% of descriptions of the chromatographic setups are incomplete (according to our definition) and an additional 10% of the descriptions contained ambiguous and/or incorrect information. Accordingly, only about 20% of the descriptions allow further (re-)use of the data, e.g. for RT prediction. Therefore, we have started to develop a unified and standardized notation for chromatographic metadata with detailed and specific description of eluents, columns and gradients. CONCLUSION: Reporting of chromatographic metadata is currently not unified. Our recommended suggestions for metadata reporting will enable more standardization and automatization in future reporting.


Assuntos
Metabolômica , Metadados , Espectrometria de Massas em Tandem , Cromatografia Líquida , Temperatura
6.
Gigascience ; 112022 11 21.
Artigo em Inglês | MEDLINE | ID: mdl-36409836

RESUMO

The Common Fund Data Ecosystem (CFDE) has created a flexible system of data federation that enables researchers to discover datasets from across the US National Institutes of Health Common Fund without requiring that data owners move, reformat, or rehost those data. This system is centered on a catalog that integrates detailed descriptions of biomedical datasets from individual Common Fund Programs' Data Coordination Centers (DCCs) into a uniform metadata model that can then be indexed and searched from a centralized portal. This Crosscut Metadata Model (C2M2) supports the wide variety of data types and metadata terms used by individual DCCs and can readily describe nearly all forms of biomedical research data. We detail its use to ingest and index data from 11 DCCs.


Assuntos
Ecossistema , Administração Financeira , Metadados
7.
Biomed Res Int ; 2022: 7854479, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35795316

RESUMO

Objective: To evaluate Peruvian scientific publications in dentistry according to sex disparity (2011-2020). Methods: This was a retrospective bibliometric study. The unit of analysis was made up of Peruvian dentistry publications indexed in the Scopus database during the last 10 years. Records with metadata (410) corresponding to the period 2011-2020 were downloaded and standardized and refined by analyzing the metadata. The search strategy was developed based on the individual profiles of each Peruvian institution that has a dental school or college. It was evaluated according to the AF-ID of each institution in the Scopus database. In addition, the information provided by the Scopus SciVal tool was used. Finally, publications, impact, and collaboration indicators were used, such as total number per document, per author, average of citations, h-index, collaboration rate, number of institutions, the Source Normalized Impact per Paper indicator, the CiteScore, and the Scopus Field-Weighted Citation Impact. Results: The greatest increase was evident in 2018, with 2019 and 2020 being the maximum peak of scientific publication growth. However, sustained growth has not been evidenced in relation to the female sex. The analysis of coauthorship by the authors revealed four large clusters, of which the first three were represented by male researchers, such as Arriola-Guillen L., Mayta-Tovalino F., and Mendoza-Azpur G., and one by a female, Guerrero María E. Evaluating the national scientific publication in dentistry according to the CiteScore, it was found that most of the publications (145) from Peru were published in Q4 journals, although 90 manuscripts were published in Q1 journals. Conclusions: The Peruvian national dental publication in the last 10 years was mainly supported by male dentists, which invites us to reflect on the need to equalize opportunities so that female researchers can also reduce these gaps.


Assuntos
Bibliometria , Metadados , Odontologia , Feminino , Humanos , Masculino , Peru , Estudos Retrospectivos
8.
Sci Rep ; 12(1): 5767, 2022 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-35388080

RESUMO

Accumulation of beta-amyloid in the brain and cognitive decline are considered hallmarks of Alzheimer's disease. Knowing from previous studies that these two factors can manifest in the retina, the aim was to investigate whether a deep learning method was able to predict the cognition of an individual from a RGB image of his retina and metadata. A deep learning model, EfficientNet, was used to predict cognitive scores from the Canadian Longitudinal Study on Aging (CLSA) database. The proposed model explained 22.4% of the variance in cognitive scores on the test dataset using fundus images and metadata. Metadata alone proved to be more effective in explaining the variance in the sample (20.4%) versus fundus images (9.3%) alone. Attention maps highlighted the optic nerve head as the most influential feature in predicting cognitive scores. The results demonstrate that RGB fundus images are limited in predicting cognition.


Assuntos
Aprendizado Profundo , Canadá , Cognição , Fundo de Olho , Estudos Longitudinais , Metadados
9.
J Biomed Semantics ; 13(1): 10, 2022 03 18.
Artigo em Inglês | MEDLINE | ID: mdl-35303946

RESUMO

BACKGROUND: Health data from different specialties or domains generallly have diverse formats and meanings, which can cause semantic communication barriers when these data are exchanged among heterogeneous systems. As such, this study is intended to develop a national health concept data model (HCDM) and develop a corresponding system to facilitate healthcare data standardization and centralized metadata management. METHODS: Based on 55 data sets (4640 data items) from 7 health business domains in China, a bottom-up approach was employed to build the structure and metadata for HCDM by referencing HL7 RIM. According to ISO/IEC 11179, a top-down approach was used to develop and standardize the data elements. RESULTS: HCDM adopted three-level architecture of class, attribute and data type, and consisted of 6 classes and 15 sub-classes. Each class had a set of descriptive attributes and every attribute was assigned a data type. 100 initial data elements (DEs) were extracted from HCDM and 144 general DEs were derived from corresponding initial DEs. Domain DEs were transformed by specializing general DEs using 12 controlled vocabularies which developed from HL7 vocabularies and actual health demands. A model-based system was successfully established to evaluate and manage the NHDD. CONCLUSIONS: HCDM provided a unified metadata reference for multi-source data standardization and management. This approach of defining health data elements was a feasible solution in healthcare information standardization to enable healthcare interoperability in China.


Assuntos
Metadados , Vocabulário Controlado , Atenção à Saúde , Semântica
10.
PLoS One ; 17(2): e0263891, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35148341

RESUMO

Crowdfunding platforms allow entrepreneurs to publish projects and raise funds for realizing them. Hence, the question of what influences projects' fundraising success is very important. Previous studies examined various factors such as project goals and project duration that may influence the outcomes of fundraising campaigns. We present a novel model for predicting the success of crowdfunding projects in meeting their funding goals. Our model focuses on semantic features only, whose performance is comparable to that of previous models. In an additional model we developed, we examine both project metadata and project semantics, delivering a comprehensive study of factors influencing crowdfunding success. Further, we analyze a large dataset of crowdfunding project data, larger than reported in the art. Finally, we show that when combining semantics and metadata, we arrive at F1 score accuracy of 96.2%. We compare our model's accuracy to the accuracy of previous research models by applying their methods on our dataset, and demonstrate higher accuracy of our model. In addition to our scientific contribution, we provide practical recommendations that may increase project funding success chances.


Assuntos
Crowdsourcing/métodos , Obtenção de Fundos/métodos , Algoritmos , Humanos , Metadados , Modelos Teóricos , Semântica
11.
PLoS Comput Biol ; 17(9): e1009336, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34550966

RESUMO

HIV molecular epidemiology estimates the transmission patterns from clustering genetically similar viruses. The process involves connecting genetically similar genotyped viral sequences in the network implying epidemiological transmissions. This technique relies on genotype data which is collected only from HIV diagnosed and in-care populations and leaves many persons with HIV (PWH) who have no access to consistent care out of the tracking process. We use machine learning algorithms to learn the non-linear correlation patterns between patient metadata and transmissions between HIV-positive cases. This enables us to expand the transmission network reconstruction beyond the molecular network. We employed multiple commonly used supervised classification algorithms to analyze the San Diego Primary Infection Resource Consortium (PIRC) cohort dataset, consisting of genotypes and nearly 80 additional non-genetic features. First, we trained classification models to determine genetically unrelated individuals from related ones. Our results show that random forest and decision tree achieved over 80% in accuracy, precision, recall, and F1-score by only using a subset of meta-features including age, birth sex, sexual orientation, race, transmission category, estimated date of infection, and first viral load date besides genetic data. Additionally, both algorithms achieved approximately 80% sensitivity and specificity. The Area Under Curve (AUC) is reported 97% and 94% for random forest and decision tree classifiers respectively. Next, we extended the models to identify clusters of similar viral sequences. Support vector machine demonstrated one order of magnitude improvement in accuracy of assigning the sequences to the correct cluster compared to dummy uniform random classifier. These results confirm that metadata carries important information about the dynamics of HIV transmission as embedded in transmission clusters. Hence, novel computational approaches are needed to apply the non-trivial knowledge collected from inter-individual genetic information to metadata from PWH in order to expand the estimated transmissions. We note that feature extraction alone will not be effective in identifying patterns of transmission and will result in random clustering of the data, but its utilization in conjunction with genetic data and the right algorithm can contribute to the expansion of the reconstructed network beyond individuals with genetic data.


Assuntos
Aprendizado de Máquina , Metadados , Algoritmos , Análise por Conglomerados , Estudos de Viabilidade , Infecções por HIV/epidemiologia , Infecções por HIV/transmissão , Humanos
12.
RECIIS (Online) ; 15(3): 722-735, jul.-set. 2021. ilus, tab
Artigo em Inglês | LILACS | ID: biblio-1342698

RESUMO

The FAIR principles have become a data management instrument for the academic and scientific community, since they provide a set of guiding principles to bring findability, accessibility, interoperability and reusability to data and metadata stewardship. Since their official publication in 2016 by Scientific Data ­ Nature, these principles have received worldwide recognition and have been quickly endorsed and adopted as a cornerstone of data stewardship and research policy. However, when put into practice, they occasionally result in organisational, legal and technological challenges that can lead to doubts and uncertainty as to whether the effort of implementing them is worthwhile. Soon after their publication, the European Commission and other funding agencies started to require that project proposals include a Data Management Plan (DMP) based on the FAIR principles. This paper reports on the adherence of DMPs to the FAIR principles, critically evaluating ten European DMP templates. We observed that the current FAIRness of most of these DMPs is only partly satisfactory, in that they address data best practices, findability, accessibility and sometimes preservation, but pay much less attention to metadata and interoperability.


Os princípios FAIR tornaram-se um instrumento de gestão de dados para a comunidade acadêmica e científica, uma vez que fornecem um conjunto de princípios orientadores que facilitam a localização, acessibilidade, interoperabilidade e reutilização de dados e metadados. Desde sua publicação oficial em 2016 pela Scientific Data - Nature, esses princípios receberam reconhecimento mundial e foram rapidamente endossados e adotados como pilares da gestão de dados e das políticas de pesquisa. No entanto, quando postos em prática, apresentam ocasionalmente desafios organizacionais, jurídicos e tecnológicos que podem levar a dúvidas e incertezas quanto ao esforço em implementá-los. Logo após sua publicação, a Comissão Europeia e outras agências de financiamento começaram a exigir nas suas propostas de projetos um Plano de Gestão de Dados (PGD) com base nos princípios da FAIR. Este artigo relata a aderência dos PGDs aos princípios FAIR, avaliando criticamente dez modelos europeus de PGD. Observamos que o nível de FAIRness da maioria dos PGDs analisados ainda é parcialmente satisfatório, uma vez que abordam as melhores práticas de dados, localização, acessibilidade e, às vezes, preservação, mas dão pouca atenção aos metadados e a interoperabilidade.


Los principios FAIR se han convertido en una herramienta de gestión de datos para la comunidad académica y científica, ya que proporcionan un conjunto de principios rectores que facilitan la localización, accesibilidad, interoperabilidad y reutilización de la gestión de datos y metadatos. Desde su publicación oficial en 2016 por Scientific Data - Nature, estos principios han recibido reconocimiento mundial y fueron rápidamente respaldados y adoptados como pilares de la política de investigación y gestión de datos. Sin embargo, cuando se ponen en práctica, ocasionalmente presentan desafíos organizativos, legales y tecnológicos que pueden generar dudas e incertidumbres sobre el esfuerzo para implementarlos. Poco después de su publicación, la Comisión Europea y otras agencias de financiación comenzaron a exigir en sus propuestas de proyectos un Plan de Gestión de Datos (PGD) basado en los principios de FAIR. Este artículo informa sobre la adherencia de los PGD a los principios FAIR, evaluando críticamente diez modelos europeos de PGD. Observamos que el nivel de FAIRness de la mayoría de los PGD analizados sigue siendo parcialmente insatisfactorio, ya que abordan las mejores prácticas de datos, ubicación, accesibilidad y, a veces, preservación, pero prestan poca atención a los metadatos y la interoperabilidad.


Assuntos
Humanos , Metadados , Comunicação Acadêmica , Interoperabilidade da Informação em Saúde , Gerenciamento de Dados , Comentário , Política de Pesquisa em Saúde , Domínios Científicos , Análise de Dados
13.
Appl Clin Inform ; 12(4): 826-835, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34433217

RESUMO

BACKGROUND: Many research initiatives aim at using data from electronic health records (EHRs) in observational studies. Participating sites of the German Medical Informatics Initiative (MII) established data integration centers to integrate EHR data within research data repositories to support local and federated analyses. To address concerns regarding possible data quality (DQ) issues of hospital routine data compared with data specifically collected for scientific purposes, we have previously presented a data quality assessment (DQA) tool providing a standardized approach to assess DQ of the research data repositories at the MIRACUM consortium's partner sites. OBJECTIVES: Major limitations of the former approach included manual interpretation of the results and hard coding of analyses, making their expansion to new data elements and databases time-consuming and error prone. We here present an enhanced version of the DQA tool by linking it to common data element definitions stored in a metadata repository (MDR), adopting the harmonized DQA framework from Kahn et al and its application within the MIRACUM consortium. METHODS: Data quality checks were consequently aligned to a harmonized DQA terminology. Database-specific information were systematically identified and represented in an MDR. Furthermore, a structured representation of logical relations between data elements was developed to model plausibility-statements in the MDR. RESULTS: The MIRACUM DQA tool was linked to data element definitions stored in a consortium-wide MDR. Additional databases used within MIRACUM were linked to the DQ checks by extending the respective data elements in the MDR with the required information. The evaluation of DQ checks was automated. An adaptable software implementation is provided with the R package DQAstats. CONCLUSION: The enhancements of the DQA tool facilitate the future integration of new data elements and make the tool scalable to other databases and data models. It has been provided to all ten MIRACUM partners and was successfully deployed and integrated into their respective data integration center infrastructure.


Assuntos
Confiabilidade dos Dados , Informática Médica , Bases de Dados Factuais , Registros Eletrônicos de Saúde , Metadados
15.
BMC Bioinformatics ; 22(1): 168, 2021 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-33784977

RESUMO

BACKGROUND: Women are at more than 1.5-fold higher risk for clinically relevant adverse drug events. While this higher prevalence is partially due to gender-related effects, biological sex differences likely also impact drug response. Publicly available gene expression databases provide a unique opportunity for examining drug response at a cellular level. However, missingness and heterogeneity of metadata prevent large-scale identification of drug exposure studies and limit assessments of sex bias. To address this, we trained organism-specific models to infer sample sex from gene expression data, and used entity normalization to map metadata cell line and drug mentions to existing ontologies. Using this method, we inferred sex labels for 450,371 human and 245,107 mouse microarray and RNA-seq samples from refine.bio. RESULTS: Overall, we find slight female bias (52.1%) in human samples and (62.5%) male bias in mouse samples; this corresponds to a majority of mixed sex studies in humans and single sex studies in mice, split between female-only and male-only (25.8% vs. 18.9% in human and 21.6% vs. 31.1% in mouse, respectively). In drug studies, we find limited evidence for sex-sampling bias overall; however, specific categories of drugs, including human cancer and mouse nervous system drugs, are enriched in female-only and male-only studies, respectively. We leverage our expression-based sex labels to further examine the complexity of cell line sex and assess the frequency of metadata sex label misannotations (2-5%). CONCLUSIONS: Our results demonstrate limited overall sex bias, while highlighting high bias in specific subfields and underscoring the importance of including sex labels to better understand the underlying biology. We make our inferred and normalized labels, along with flags for misannotated samples, publicly available to catalyze the routine use of sex as a study variable in future analyses.


Assuntos
Bases de Dados Factuais , Expressão Gênica , Neoplasias , Fatores Sexuais , Animais , Viés , Feminino , Masculino , Metadados , Camundongos , Neoplasias/genética
17.
F1000Res ; 9: 311, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32528663

RESUMO

Background: Given the increasing number and heterogeneity of data repositories, an improvement and harmonisation of practice within repositories for clinical trial data is urgently needed. The objective of the study was to develop and evaluate a demonstrator repository, using a widely used repository system (DSpace), and then explore its suitability for providing access to individual participant data (IPD) from clinical research. Methods: After a study of the available options, DSpace (version 6.3) was selected as the software for developing a demonstrator implementation of a repository for clinical trial data. In total, 19 quality criteria were defined, using previous work assessing clinical data repositories as a guide, and the demonstrator implementation was then assessed with respect to those criteria. Results: Generally, the performance of the DSpace demonstrator repository in supporting sensitive personal data such as that from clinical trials was strong, with 14 requirements demonstrated (74%), including the necessary support for metadata and identifiers. Two requirements could not be demonstrated (the ability to include de-identification tools and the availabiltiy of a self-attestation system) and three requirements were only partially demonstrated (ability to provide links to de-identification tools and requirements, incorporation of a data transfer agreement in system workflow, and capability to offer managed access through application on a case by case basis). Conclusions: Technically, the system was able to support most of the pre-defined requirements, though there are areas where support could be improved. Of course, in a productive repository, appropriate policies and procedures would be needed to direct the use of the available technical features. A technical evaluation should therefore be seen as indicating a system's potential, rather than being a definite assessment of its suitability. DSpace clearly has considerable potential in this context and appears a suitable base for further exploration of the issues around storing sensitive data.


Assuntos
Ensaios Clínicos como Assunto , Metadados , Software , Humanos
18.
Methods Inf Med ; 59(1): 48-56, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-32535879

RESUMO

BACKGROUND: There is a recognized need to improve how scholarly data are managed and accessed. The scientific community has proposed the findable, accessible, interoperable, and reusable (FAIR) data principles to address this issue. OBJECTIVE: The objective of this case study was to develop a system for improving the FAIRness of Healthcare Cost and Utilization Project's State Emergency Department Databases (HCUP's SEDD) within the context of data catalog availability. METHODS: A search tool, EDCat (Emergency Department Catalog), was designed to improve the "FAIRness" of electronic health databases and tested on datasets from HCUP-SEDD. ElasticSearch was used as a database for EDCat's search engine. Datasets were curated and defined. Searchable data dictionary-related elements and unified medical language system (UMLS) concepts were included in the curated metadata. Functionality to standardize search terms using UMLS concepts was added to the user interface. RESULTS: The EDCat system improved the overall FAIRness of HCUP-SEDD by improving the findability of individual datasets and increasing the efficacy of searches for specific data elements and data types. DISCUSSION: The databases considered for this case study were limited in number as few data distributors make the data dictionaries of datasets available. The publication of data dictionaries should be encouraged through the FAIR principles, and further efforts should be made to improve the specificity and measurability of the FAIR principles. CONCLUSION: In this case study, the distribution of datasets from HCUP-SEDD was made more FAIR through the development of a search tool, EDCat. EDCat will be evaluated and developed further to include datasets from other sources.


Assuntos
Bases de Dados Factuais , Serviço Hospitalar de Emergência , Interoperabilidade da Informação em Saúde , Acessibilidade aos Serviços de Saúde , Armazenamento e Recuperação da Informação , Metadados
20.
Database (Oxford) ; 20202020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-32239182

RESUMO

Cancers arise from the accumulation of somatic genome mutations, which can be influenced by inherited genomic variants and external factors such as environmental or lifestyle-related exposure. Due to the heterogeneity of cancers, precise information about the genomic composition of germline and malignant tissues has to be correlated with morphological, clinical and extrinsic features to advance medical knowledge and treatment options. With global differences in cancer frequencies and disease types, geographic data is of importance to understand the interplay between genetic ancestry and environmental influence in cancer incidence, progression and treatment outcome. In this study, we analyzed the current landscape of oncogenomic screening publications for geographic information content and quality, to address underrepresented study populations and thereby to fill prominent gaps in our understanding of interactions between somatic variations, population genetics and environmental factors in oncogenesis. We conclude that while the use of proxy-derived geographic annotations can be useful for coarse-grained associations, the study of geo-correlated factors in cancer causation and progression will benefit from standardized geographic provenance annotations. Additionally, publication-derived geographic provenance data allowed us to highlight stark inequality in the geographies of cancer genome profiling, with a near lack of sizable studies from Africa and other large regions.


Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Genoma Humano/genética , Genômica/métodos , Neoplasias/genética , Curadoria de Dados/métodos , Mineração de Dados/métodos , Europa (Continente) , Geografia , Humanos , Internet , Metadados , Publicações/estatística & dados numéricos , Estados Unidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA