Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 532
Filtrar
1.
Gigascience ; 112022 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-36409836

RESUMO

The Common Fund Data Ecosystem (CFDE) has created a flexible system of data federation that enables researchers to discover datasets from across the US National Institutes of Health Common Fund without requiring that data owners move, reformat, or rehost those data. This system is centered on a catalog that integrates detailed descriptions of biomedical datasets from individual Common Fund Programs' Data Coordination Centers (DCCs) into a uniform metadata model that can then be indexed and searched from a centralized portal. This Crosscut Metadata Model (C2M2) supports the wide variety of data types and metadata terms used by individual DCCs and can readily describe nearly all forms of biomedical research data. We detail its use to ingest and index data from 11 DCCs.


Assuntos
Ecossistema , Administração Financeira , Metadados
2.
F1000Res ; 11: 638, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36405555

RESUMO

Background Knowing the needs of the bioimaging community with respect to research data management (RDM) is essential for identifying measures that enable adoption of the FAIR (findable, accessible, interoperable, reusable) principles for microscopy and bioimage analysis data across disciplines. As an initiative within Germany's National Research Data Infrastructure, we conducted this community survey in summer 2021 to assess the state of the art of bioimaging RDM and the community needs. Methods An online survey was conducted with a mixed question-type design. We created a questionnaire tailored to relevant topics of the bioimaging community, including specific questions on bioimaging methods and bioimage analysis, as well as more general questions on RDM principles and tools. 203 survey entries were included in the analysis covering the perspectives from various life and biomedical science disciplines and from participants at different career levels. Results The results highlight the importance and value of bioimaging RDM and data sharing. However, the practical implementation of FAIR practices is impeded by technical hurdles, lack of knowledge, and insecurity about the legal aspects of data sharing. The survey participants request metadata guidelines and annotation tools and endorse the usage of image data management platforms. At present, OMERO (Open Microscopy Environment Remote Objects) is the best known and most widely used platform. Most respondents rely on image processing and analysis, which they regard as the most time-consuming step of the bioimage data workflow. While knowledge about and implementation of electronic lab notebooks and data management plans is limited, respondents acknowledge their potential value for data handling and publication. Conclusions The bioimaging community acknowledges and endorses the value of RDM and data sharing. Still, there is a need for information, guidance, and standardization to foster the adoption of FAIR data handling. This survey may help inspiring targeted measures to close this gap.


Assuntos
Gerenciamento de Dados , Metadados , Humanos , Disseminação de Informação , Inquéritos e Questionários , Fluxo de Trabalho
3.
Sci Data ; 9(1): 685, 2022 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-36357404

RESUMO

We developed pISA-tree, a straightforward and flexible data management solution for organisation of life science project-associated research data and metadata. pISA-tree was initiated by end-user requirements thus its strong points are practicality and low maintenance cost. It enables on-the-fly creation of enriched directory tree structure (project/Investigation/Study/Assay) based on the ISA model, in a standardised manner via consecutive batch files. Templates-based metadata is generated in parallel at each level enabling guided submission of experiment metadata. pISA-tree is complemented by two R packages, pisar and seekr. pisar facilitates integration of pISA-tree datasets into bioinformatic pipelines and generation of ISA-Tab exports. seekr enables synchronisation with the FAIRDOMHub repository. Applicability of pISA-tree was demonstrated in several national and international multi-partner projects. The system thus supports findable, accessible, interoperable and reusable (FAIR) research and is in accordance with the Open Science initiative. Source code and documentation of pISA-tree are available at https://github.com/NIB-SI/pISA-tree .


Assuntos
Disciplinas das Ciências Biológicas , Gerenciamento de Dados , Metadados , Software , Projetos de Pesquisa
4.
Nat Commun ; 13(1): 6736, 2022 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-36347858

RESUMO

There are currently >1.3 million human -omics samples that are publicly available. This valuable resource remains acutely underused because discovering particular samples from this ever-growing data collection remains a significant challenge. The major impediment is that sample attributes are routinely described using varied terminologies written in unstructured natural language. We propose a natural-language-processing-based machine learning approach (NLP-ML) to infer tissue and cell-type annotations for genomics samples based only on their free-text metadata. NLP-ML works by creating numerical representations of sample descriptions and using these representations as features in a supervised learning classifier that predicts tissue/cell-type terms. Our approach significantly outperforms an advanced graph-based reasoning annotation method (MetaSRA) and a baseline exact string matching method (TAGGER). Model similarities between related tissues demonstrate that NLP-ML models capture biologically-meaningful signals in text. Additionally, these models correctly classify tissue-associated biological processes and diseases based on their text descriptions alone. NLP-ML models are nearly as accurate as models based on gene-expression profiles in predicting sample tissue annotations but have the distinct capability to classify samples irrespective of the genomics experiment type based on their text metadata. Python NLP-ML prediction code and trained tissue models are available at https://github.com/krishnanlab/txt2onto .


Assuntos
Metadados , Processamento de Linguagem Natural , Humanos , Aprendizado de Máquina , Genômica , Idioma
5.
Sci Data ; 9(1): 678, 2022 11 08.
Artigo em Inglês | MEDLINE | ID: mdl-36347894

RESUMO

Recent advances in high-throughput experiments and systems biology approaches have resulted in hundreds of publications identifying "immune signatures". Unfortunately, these are often described within text, figures, or tables in a format not amenable to computational processing, thus severely hampering our ability to fully exploit this information. Here we present a data model to represent immune signatures, along with the Human Immunology Project Consortium (HIPC) Dashboard ( www.hipc-dashboard.org ), a web-enabled application to facilitate signature access and querying. The data model captures the biological response components (e.g., genes, proteins, cell types or metabolites) and metadata describing the context under which the signature was identified using standardized terms from established resources (e.g., HGNC, Protein Ontology, Cell Ontology). We have manually curated a collection of >600 immune signatures from >60 published studies profiling human vaccination responses for the current release. The system will aid in building a broader understanding of the human immune response to stimuli by enabling researchers to easily access and interrogate published immune signatures.


Assuntos
Software , Biologia de Sistemas , Vacinação , Humanos , Metadados
6.
J Histotechnol ; 45(4): 132-147, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36317862

RESUMO

The central tenet of scientific research is the rigorous application of the scientific method to experimental design, analysis, interpretation, and reporting of results. In order to confer validity to a hypothesis, experimental details must be transparent and results must be reproducible. Failure to achieve this minimum indicates a deficiency in rationale, design, and/or execution, necessitating further experimental refinement or hypothesis reformulation. More importantly, rigorous application of the scientific method advances scientific knowledge by enabling others to identify weaknesses or gaps that can be exploited by new ideas or technology that inevitably extend, improve, or refine a hypothesis. Experimental details, described in manuscript materials and methods, are the principal vehicle used to communicate procedures, techniques, and resources necessary for experimental reproducibility. Recent examination of the biomedical literature has shown that many published articles lack sufficiently detailed methodological information to reproduce experiments. There are few broadly established practice guidelines and quality assurance standards in basic biomedical research. The current paper provides a framework of best practices to address the lack of reporting of detailed materials and methods that is pervasive in histological slide-based assays. Our goal is to establish a structured framework that highlights the key factors necessary for thorough collection of metadata and reporting of slide-based assays.


Assuntos
Pesquisa Biomédica , Metadados , Reprodutibilidade dos Testes , Projetos de Pesquisa , Publicações
7.
Sensors (Basel) ; 22(22)2022 Nov 19.
Artigo em Inglês | MEDLINE | ID: mdl-36433560

RESUMO

Mobile app developers are often obliged by regulatory frameworks to provide a privacy policy in natural comprehensible language to describe their apps' privacy practices. However, prior research has revealed that: (1) not all app developers offer links to their privacy policies; and (2) even if they do offer such access, it is difficult to determine if it is a valid link to a (valid) policy. While many prior studies looked at this issue in Google Play Store, Apple App Store, and particularly the iOS store, is much less clear. In this paper, we conduct the first and the largest study to investigate the previous issues in the iOS app store ecosystem. First, we introduce an App Privacy Policy Extractor (APPE), a system that embraces and analyses the metadata of over two million apps to give insightful information about the distribution of the supposed privacy policies, and the content of the provided privacy policy links, store-wide. The result shows that only 58.5% of apps provide links to purported privacy policies, while 39.3% do not provide policy links at all. Our investigation of the provided links shows that only 38.4% of those links were directed to actual privacy policies, while 61.6% failed to lead to a privacy policy. Further, for research purposes we introduce the App Privacy Policy Corpus (APPC-451K); the largest app privacy policy corpus consisting of data relating to more than 451K verified privacy policies.


Assuntos
Aplicativos Móveis , Privacidade , Ecossistema , Políticas , Metadados
8.
Int J Mol Sci ; 23(22)2022 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-36430895

RESUMO

Here we developed KARAJ, a fast and flexible Linux command-line tool to automate the end-to-end process of querying and downloading a wide range of genomic and transcriptomic sequence data types. The input to KARAJ is a list of PMCIDs or publication URLs or various types of accession numbers to automate four tasks as follows; firstly, it provides a summary list of accessible datasets generated by or used in these scientific articles, enabling users to select appropriate datasets; secondly, KARAJ calculates the size of files that users want to download and confirms the availability of adequate space on the local disk; thirdly, it generates a metadata table containing sample information and the experimental design of the corresponding study; and lastly, it enables users to download supplementary data tables attached to publications. Further, KARAJ provides a parallel downloading framework powered by Aspera connect which reduces the downloading time significantly.


Assuntos
Software , Transcriptoma , Genoma , Genômica , Metadados
9.
BMC Genom Data ; 23(1): 79, 2022 11 12.
Artigo em Inglês | MEDLINE | ID: mdl-36371151

RESUMO

While data sharing increases, most open data are difficult to re-use or to identify due to the lack of related metada. In this editorial, I discussed about the importance of those metadata in the context of genomic, and why they are mandatory to ensure the success of data sharing.


Assuntos
Disseminação de Informação , Metadados , Genômica
10.
Sci Data ; 9(1): 700, 2022 11 14.
Artigo em Inglês | MEDLINE | ID: mdl-36376356

RESUMO

Research can be more transparent and collaborative by using Findable, Accessible, Interoperable, and Reusable (FAIR) principles to publish Earth and environmental science data. Reporting formats-instructions, templates, and tools for consistently formatting data within a discipline-can help make data more accessible and reusable. However, the immense diversity of data types across Earth science disciplines makes development and adoption challenging. Here, we describe 11 community reporting formats for a diverse set of Earth science (meta)data including cross-domain metadata (dataset metadata, location metadata, sample metadata), file-formatting guidelines (file-level metadata, CSV files, terrestrial model data archiving), and domain-specific reporting formats for some biological, geochemical, and hydrological data (amplicon abundance tables, leaf-level gas exchange, soil respiration, water and sediment chemistry, sensor-based hydrologic measurements). More broadly, we provide guidelines that communities can use to create new (meta)data formats that integrate with their scientific workflows. Such reporting formats have the potential to accelerate scientific discovery and predictions by making it easier for data contributors to provide (meta)data that are more interoperable and reusable.


Assuntos
Ciência Ambiental , Projetos de Pesquisa , Metadados , Fluxo de Trabalho
11.
Metabolomics ; 18(12): 97, 2022 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-36436113

RESUMO

INTRODUCTION: The structural identification of metabolites represents one of the current bottlenecks in non-targeted liquid chromatography-mass spectrometry (LC-MS) based metabolomics. The Metabolomics Standard Initiative has developed a multilevel system to report confidence in metabolite identification, which involves the use of MS, MS/MS and orthogonal data. Limitations due to similar or same fragmentation pattern (e.g. isomeric compounds) can be overcome by the additional orthogonal information of the retention time (RT), since it is a system property that is different for each chromatographic setup. OBJECTIVES: In contrast to MS data, sharing of RT data is not as widespread. The quality of data and its (re-)useability depend very much on the quality of the metadata. We aimed to evaluate the coverage and quality of this metadata from public metabolomics repositories. METHODS: We acquired an overview on the current reporting of chromatographic separation conditions. For this purpose, we defined the following information as important details that have to be provided: column name and dimension, flow rate, temperature, composition of eluents and gradient. RESULTS: We found that 70% of descriptions of the chromatographic setups are incomplete (according to our definition) and an additional 10% of the descriptions contained ambiguous and/or incorrect information. Accordingly, only about 20% of the descriptions allow further (re-)use of the data, e.g. for RT prediction. Therefore, we have started to develop a unified and standardized notation for chromatographic metadata with detailed and specific description of eluents, columns and gradients. CONCLUSION: Reporting of chromatographic metadata is currently not unified. Our recommended suggestions for metadata reporting will enable more standardization and automatization in future reporting.


Assuntos
Metabolômica , Metadados , Espectrometria de Massas em Tandem , Cromatografia Líquida , Temperatura
12.
BMJ Open ; 12(11): e064362, 2022 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-36414312

RESUMO

OBJECTIVES: To support the Zika virus (ZIKV) Individual Participant Data (IPD) Consortium's efforts to harmonise and analyse IPD from ZIKV-related prospective cohort studies and surveillance-based studies of pregnant women and their infants and children; we developed and disseminated a metadata survey among ZIKV-IPD Meta-Analysis (MA) study participants to identify and provide a comprehensive overview of study-level heterogeneity in exposure, outcome and covariate ascertainment and definitions. SETTING: Cohort and surveillance studies that measured ZIKV infection during pregnancy or at birth and measured fetal, infant, or child outcomes were identified through a systematic search and consultations with ZIKV researchers and Ministries of Health from 20 countries or territories. PARTICIPANTS: Fifty-four cohort or active surveillance studies shared deidentified data for the IPD-MA and completed the metadata survey, representing 33 061 women (11 020 with ZIKV) and 18 281 children. PRIMARY AND SECONDARY OUTCOME MEASURES: Study-level heterogeneity in exposure, outcome and covariate ascertainment and definitions. RESULTS: Median study sample size was 268 (IQR=100, 698). Inclusion criteria, follow-up procedures and exposure and outcome ascertainment were highly heterogenous, differing meaningfully across regions and multisite studies. Enrolment duration and follow-up for children after birth varied before and after the declaration of the Public Health Emergency of International Concern (PHEIC) and according to the type of funding received. CONCLUSION: This work highlights the logistic and statistical challenges that must be addressed to account for the multiple sources of within-study and between-study heterogeneity when conducting IPD-MAs of data collected in the research response to emergent pathogens like ZIKV.


Assuntos
Complicações Infecciosas na Gravidez , Infecção por Zika virus , Zika virus , Criança , Feminino , Humanos , Lactente , Recém-Nascido , Gravidez , Metadados , Parto , Complicações Infecciosas na Gravidez/epidemiologia , Gestantes , Estudos Prospectivos , Infecção por Zika virus/epidemiologia , Infecção por Zika virus/complicações , Metanálise como Assunto
14.
BMC Bioinformatics ; 23(1): 415, 2022 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-36207678

RESUMO

BACKGROUND: Transcriptional regulation is a fundamental mechanism underlying biological functions. In recent years, a broad array of RNA-Seq tools have been used to measure transcription levels in biological experiments, in whole organisms, tissues, and at the single cell level. Collectively, this is a vast comparative dataset on transcriptional processes across organisms. Yet, due to technical differences between the studies (sequencing, experimental design, and analysis) extracting usable comparative information and conducting meta-analyses remains challenging. RESULTS: We introduce Comparative RNA-Seq Metadata Analysis Pipeline (CoRMAP), a meta-analysis tool to retrieve comparative gene expression data from any RNA-Seq dataset using de novo assembly, standardized gene expression tools and the implementation of OrthoMCL, a gene orthology search algorithm. It employs the use of orthogroup assignments to ensure the accurate comparison of gene expression levels between experiments and species. Here we demonstrate the use of CoRMAP on two mouse brain transcriptomes with similar scope, that were collected several years from each other using different sequencing technologies and analysis methods. We also compare the performance of CoRMAP with a functional mapping tool, previously published. CONCLUSION: CoRMAP provides a framework for the meta-analysis of RNA-Seq data from divergent taxonomic groups. This method facilitates the retrieval and comparison of gene expression levels from published data sets using standardized assembly and analysis. CoRMAP does not rely on reference genomes and consequently facilitates direct comparison between diverse studies on a range of organisms.


Assuntos
Metadados , Transcriptoma , Animais , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Camundongos , RNA-Seq , Análise de Sequência de RNA/métodos
15.
Sci Data ; 9(1): 659, 2022 10 28.
Artigo em Inglês | MEDLINE | ID: mdl-36307424

RESUMO

Metadata describe information about data source, type of creation, structure, status and semantics and are prerequisite for preservation and reuse of medical data. To overcome the hurdle of disparate data sources and repositories with heterogeneous data formats a metadata crosswalk was initiated, based on existing standards. FAIR Principles were included, as well as data format specifications. The metadata crosswalk is the foundation of data provision between a Medical Data Integration Center (MeDIC) and researchers, providing a selection of metadata information for research design and requests. Based on the crosswalk, metadata items were prioritized and categorized to demonstrate that not one single predefined standard meets all requirements of a MeDIC and only a maximum data set of metadata is suitable for use. The development of a convergence format including the maximum data set is the anticipated solution for an automated transformation of metadata in a MeDIC.


Assuntos
Armazenamento e Recuperação da Informação , Metadados , Semântica , Padrões de Referência
16.
Elife ; 112022 10 04.
Artigo em Inglês | MEDLINE | ID: mdl-36193886

RESUMO

The neurophysiology of cells and tissues are monitored electrophysiologically and optically in diverse experiments and species, ranging from flies to humans. Understanding the brain requires integration of data across this diversity, and thus these data must be findable, accessible, interoperable, and reusable (FAIR). This requires a standard language for data and metadata that can coevolve with neuroscience. We describe design and implementation principles for a language for neurophysiology data. Our open-source software (Neurodata Without Borders, NWB) defines and modularizes the interdependent, yet separable, components of a data language. We demonstrate NWB's impact through unified description of neurophysiology data across diverse modalities and species. NWB exists in an ecosystem, which includes data management, analysis, visualization, and archive tools. Thus, the NWB data language enables reproduction, interchange, and reuse of diverse neurophysiology data. More broadly, the design principles of NWB are generally applicable to enhance discovery across biology through data FAIRness.


The brain is an immensely complex organ which regulates many of the behaviors that animals need to survive. To understand how the brain works, scientists monitor and record brain activity under different conditions using a variety of experimental techniques. These neurophysiological studies are often conducted on multiple types of cells in the brain as well as a variety of species, ranging from mice to flies, or even frogs and worms. Such a range of approaches provides us with highly informative, complementary 'views' of the brain. However, to form a complete, coherent picture of how the brain works, scientists need to be able to integrate all the data from these different experiments. For this to happen effectively, neurophysiology data need to meet certain criteria: namely, they must be findable, accessible, interoperable, and re-usable (or FAIR for short). However, the sheer diversity of neurophysiology experiments impedes the 'FAIR'-ness of the information obtained from them. To overcome this problem, researchers need a standardized way to communicate their experiments and share their results ­ in other words, a 'standard language' to describe neurophysiology data. Rübel, Tritt, Ly, Dichter, Ghosh et al. therefore set out to create such a language that was not only FAIR, but could also co-evolve with neurophysiology research. First, they produced a computer software program (called Neurodata Without Borders, or NWB for short) which generated and defined the different components of the new standard language. Then, other tools for data management were created to expand the NWB platform using the standardized language. This included data analysis and visualization methods, as well as an 'archive' to store and access data. Testing the new language and associated tools showed that they indeed allowed researchers to access, analyze, and share information from many different types of experiments, in organisms ranging from flies to humans. The NWB software is open-source, meaning that anyone can obtain a copy and make changes to it. Thus, NWB and its associated resources provide the basis for a collaborative, community-based system for sharing neurophysiology data. Rübel et al. hope that NWB will inspire similar developments across other fields of biology that share similar levels of complexity with neurophysiology.


Assuntos
Ciência de Dados , Ecossistema , Humanos , Metadados , Neurofisiologia , Software
17.
Stud Hist Philos Sci ; 96: 125-134, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36272271

RESUMO

Otto Neurath's role in the so-called protocol sentence debates is typically framed as primarily an epistemologically radical rejection of empiricist foundationalism. However, less well recognized is that from this debate, Neurath emerges with a conception of protocol statements that functions as a radical reconceptualization of evidence. Whilst recognizably still empiricist, Neurath's conception of evidence breaks with many of the key assumptions that predominate within the empiricist tradition. In rejecting the assumption of an epistemologically privileged relationship between an observer and their own observation reports, Neurath shifts the emphasis onto the importance of contextualizing information that guarantees the stability of observation reports. In so doing, he not only provides a conception of evidence better suited to the actual role of evidence in science, but also anticipates contemporary discussion of the importance of evidential metadata.


Assuntos
Metadados
18.
Database (Oxford) ; 20222022 10 08.
Artigo em Inglês | MEDLINE | ID: mdl-36208225

RESUMO

Similar to managing software packages, managing the ontology life cycle involves multiple complex workflows such as preparing releases, continuous quality control checking and dependency management. To manage these processes, a diverse set of tools is required, from command-line utilities to powerful ontology-engineering environmentsr. Particularly in the biomedical domain, which has developed a set of highly diverse yet inter-dependent ontologies, standardizing release practices and metadata and establishing shared quality standards are crucial to enable interoperability. The Ontology Development Kit (ODK) provides a set of standardized, customizable and automatically executable workflows, and packages all required tooling in a single Docker image. In this paper, we provide an overview of how the ODK works, show how it is used in practice and describe how we envision it driving standardization efforts in our community. Database URL: https://github.com/INCATools/ontology-development-kit.


Assuntos
Ontologias Biológicas , Bases de Dados Factuais , Metadados , Controle de Qualidade , Software , Fluxo de Trabalho
20.
PLoS One ; 17(9): e0274114, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36084118

RESUMO

Analysis of language geography is increasingly being used for studying spatial patterns of social dynamics. This trend is fueled by social media platforms such as Twitter which provide access to large amounts of natural language data combined with geolocation and user metadata enabling reconstruction of detailed spatial patterns of language use. Most studies are performed on large spatial scales associated with countries and regions, where language dynamics are often dominated by the effects of geographic and administrative borders. Extending to smaller, urban scales, however, allows visualization of spatial patterns of language use determined by social dynamics within the city, providing valuable information for a range of social topics from demographic studies to urban planning. So far, few studies have been made in this domain, due, in part, to the challenges in developing algorithms that accurately classify linguistic features. Here we extend urban-scale geographical analysis of language use beyond lexical meaning to include other sociolinguistic markers that identify language style, dialect and social groups. Some features, which have not been explored with social-media data on the urban scale, can be used to target a range of social phenomena. Our study focuses on Twitter use in Buenos Aires and our approach classifies tweets based on contrasting sets of tokens manually selected to target precise linguistic features. We perform statistical analyses of eleven categories of language use to quantify the presence of spatial patterns and the extent to which they are socially driven. We then perform the first comparative analysis assessing how the patterns and strength of social drivers vary with category. Finally, we derive plausible explanations for the patterns by comparing them with independently generated maps of geosocial context. Identifying these connections is a key aspect of the social-dynamics analysis which has so far received insufficient attention.


Assuntos
Mídias Sociais , Cidades , Coleta de Dados , Humanos , Linguística , Metadados
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...