Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Sci Data ; 11(1): 442, 2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38702332

RESUMO

Science funders, publishers, and data archives make decisions about how to responsibly allocate resources to maximize the reuse potential of research data. This paper introduces a dataset developed to measure the impact of archival and data curation decisions on data reuse. The dataset describes 10,605 social science research datasets, their curation histories, and reuse contexts in 94,755 publications that cover 59 years from 1963 to 2022. The dataset was constructed from study-level metadata, citing publications, and curation records available through the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan. The dataset includes information about study-level attributes (e.g., PIs, funders, subject terms); usage statistics (e.g., downloads, citations); archiving decisions (e.g., curation activities, data transformations); and bibliometric attributes (e.g., journals, authors) for citing publications. This dataset provides information on factors that contribute to long-term data reuse, which can inform the design of effective evidence-based recommendations to support high-impact research data curation decisions.

2.
Appl Ontol ; 17(2): 321-336, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36312514

RESUMO

The purpose of this study was to evaluate, revise, and extend the Informed Consent Ontology (ICO) for expressing clinical permissions, including reuse of residual clinical biospecimens and health data. This study followed a formative evaluation design and used a bottom-up modeling approach. Data were collected from the literature on US federal regulations and a study of clinical consent forms. Eleven federal regulations and fifteen permission-sentences from clinical consent forms were iteratively modeled to identify entities and their relationships, followed by community reflection and negotiation based on a series of predetermined evaluation questions. ICO included fifty-two classes and twelve object properties necessary when modeling, demonstrating appropriateness of extending ICO for the clinical domain. Twenty-six additional classes were imported into ICO from other ontologies, and twelve new classes were recommended for development. This work addresses a critical gap in formally representing permissions clinical permissions, including reuse of residual clinical biospecimens and health data. It makes missing content available to the OBO Foundry, enabling use alongside other widely-adopted biomedical ontologies. ICO serves as a machine-interpretable and interoperable tool for responsible reuse of residual clinical biospecimens and health data at scale.

3.
West J Nurs Res ; 44(11): 1068-1081, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-34238076

RESUMO

Nurse scientists are increasingly interested in conducting secondary research using real world collections of biospecimens and health data. The purposes of this scoping review are to (a) identify federal regulations and norms that bear authority or give guidance over reuse of residual clinical biospecimens and health data, (b) summarize domain experts' interpretations of permissions of such reuse, and (c) summarize key issues for interpreting regulations and norms. Final analysis included 25 manuscripts and 23 regulations and norms. This review illustrates contextual complexity for reusing residual clinical biospecimens and health data, and explores issues such as privacy, confidentiality, and deriving genetic information from biospecimens. Inconsistencies make it difficult to interpret, which regulations or norms apply, or if applicable regulations or norms are congruent. Tools are necessary to support consistent, expert-informed consent processes and downstream reuse of residual clinical biospecimens and health data by nurse scientists.


Assuntos
Confidencialidade , Consentimento Livre e Esclarecido , Humanos
4.
Appl Clin Inform ; 12(3): 429-435, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-34161986

RESUMO

BACKGROUND: The lack of machine-interpretable representations of consent permissions precludes development of tools that act upon permissions across information ecosystems, at scale. OBJECTIVES: To report the process, results, and lessons learned while annotating permissions in clinical consent forms. METHODS: We conducted a retrospective analysis of clinical consent forms. We developed an annotation scheme following the MAMA (Model-Annotate-Model-Annotate) cycle and evaluated interannotator agreement (IAA) using observed agreement (A o), weighted kappa (κw ), and Krippendorff's α. RESULTS: The final dataset included 6,399 sentences from 134 clinical consent forms. Complete agreement was achieved for 5,871 sentences, including 211 positively identified and 5,660 negatively identified as permission-sentences across all three annotators (A o = 0.944, Krippendorff's α = 0.599). These values reflect moderate to substantial IAA. Although permission-sentences contain a set of common words and structure, disagreements between annotators are largely explained by lexical variability and ambiguity in sentence meaning. CONCLUSION: Our findings point to the complexity of identifying permission-sentences within the clinical consent forms. We present our results in light of lessons learned, which may serve as a launching point for developing tools for automated permission extraction.


Assuntos
Termos de Consentimento , Estudos Retrospectivos
5.
PeerJ ; 7: e6142, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30627489

RESUMO

Aligning sequences for phylogenetic analysis (multiple sequence alignment; MSA) is an important, but increasingly computationally expensive step with the recent surge in DNA sequence data. Much of this sequence data is publicly available, but can be extremely fragmentary (i.e., a combination of full genomes and genomic fragments), which can compound the computational issues related to MSA. Traditionally, alignments are produced with automated algorithms and then checked and/or corrected "by eye" prior to phylogenetic inference. However, this manual curation is inefficient at the data scales required of modern phylogenetics and results in alignments that are not reproducible. Recently, methods have been developed for fully automating alignments of large data sets, but it is unclear if these methods produce alignments that result in compatible phylogenies when compared to more traditional alignment approaches that combined automated and manual methods. Here we use approximately 33,000 publicly available sequences from the hepatitis B virus (HBV), a globally distributed and rapidly evolving virus, to compare different alignment approaches. Using one data set comprised exclusively of whole genomes and a second that also included sequence fragments, we compared three MSA methods: (1) a purely automated approach using traditional software, (2) an automated approach including by eye manual editing, and (3) more recent fully automated approaches. To understand how these methods affect phylogenetic results, we compared resulting tree topologies based on these different alignment methods using multiple metrics. We further determined if the monophyly of existing HBV genotypes was supported in phylogenies estimated from each alignment type and under different statistical support thresholds. Traditional and fully automated alignments produced similar HBV phylogenies. Although there was variability between branch support thresholds, allowing lower support thresholds tended to result in more differences among trees. Therefore, differences between the trees could be best explained by phylogenetic uncertainty unrelated to the MSA method used. Nevertheless, automated alignment approaches did not require human intervention and were therefore considerably less time-intensive than traditional approaches. Because of this, we conclude that fully automated algorithms for MSA are fully compatible with older methods even in extremely difficult to align data sets. Additionally, we found that most HBV diagnostic genotypes did not correspond to evolutionarily-sound groups, regardless of alignment type and support threshold. This suggests there may be errors in genotype classification in the database or that HBV genotypes may need a revision.

6.
PLoS One ; 13(5): e0197325, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29746592

RESUMO

The widespread use of social media has created a valuable but underused source of data for the environmental sciences. We demonstrate the potential for images posted to the website Twitter to capture variability in vegetation phenology across United States National Parks. We process a subset of images posted to Twitter within eight U.S. National Parks, with the aim of understanding the amount of green vegetation in each image. Analysis of the relative greenness of the images show statistically significant seasonal cycles across most National Parks at the 95% confidence level, consistent with springtime green-up and fall senescence. Additionally, these social media-derived greenness indices correlate with monthly mean satellite NDVI (r = 0.62), reinforcing the potential value these data could provide in constraining models and observing regions with limited high quality scientific monitoring.


Assuntos
Ecologia/métodos , Parques Recreativos , Plantas , Mídias Sociais , Cor , Processamento de Imagem Assistida por Computador/métodos , Estações do Ano , Astronave , Estados Unidos
7.
PLoS One ; 12(3): e0172090, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28253269

RESUMO

Site-Based Data Curation (SBDC) is an approach to managing research data that prioritizes sharing and reuse of data collected at scientifically significant sites. The SBDC framework is based on geobiology research at natural hot spring sites in Yellowstone National Park as an exemplar case of high value field data in contemporary, cross-disciplinary earth systems science. Through stakeholder analysis and investigation of data artifacts, we determined that meaningful and valid reuse of digital hot spring data requires systematic documentation of sampling processes and particular contextual information about the site of data collection. We propose a Minimum Information Framework for recording the necessary metadata on sampling locations, with anchor measurements and description of the hot spring vent distinct from the outflow system, and multi-scale field photography to capture vital information about hot spring structures. The SBDC framework can serve as a global model for the collection and description of hot spring systems field data that can be readily adapted for application to the curation of data from other kinds scientifically significant sites.


Assuntos
Curadoria de Dados/métodos , Fontes Termais , Curadoria de Dados/normas , Padrões de Referência
8.
PLoS One ; 9(3): e89606, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24595056

RESUMO

The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.


Assuntos
Biodiversidade , Conhecimento , Semântica
9.
Zookeys ; (209): 219-33, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22859890

RESUMO

Legacy data from natural history collections contain invaluable and irreplaceable information about biodiversity in the recent past, providing a baseline for detecting change and forecasting the future of biodiversity on a human-dominated planet. However, these data are often not available in formats that facilitate use and synthesis. New approaches are needed to enhance the rates of digitization and data quality improvement. Notes from Nature provides one such novel approach by asking citizen scientists to help with transcription tasks. The initial web-based prototype of Notes from Nature is soon widely available and was developed collaboratively by biodiversity scientists, natural history collections staff, and experts in citizen science project development, programming and visualization. This project brings together digital images representing different types of biodiversity records including ledgers , herbarium sheets and pinned insects from multiple projects and natural history collections. Experts in developing web-based citizen science applications then designed and built a platform for transcribing textual data and metadata from these images. The end product is a fully open source web transcription tool built using the latest web technologies. The platform keeps volunteers engaged by initially explaining the scientific importance of the work via a short orientation, and then providing transcription "missions" of well defined scope, along with dynamic feedback, interactivity and rewards. Transcribed records, along with record-level and process metadata, are provided back to the institutions.  While the tool is being developed with new users in mind, it can serve a broad range of needs from novice to trained museum specialist. Notes from Nature has the potential to speed the rate of biodiversity data being made available to a broad community of users.

10.
Zookeys ; (209): 235-53, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22859891

RESUMO

Part diary, part scientific record, biological field notebooks often contain details necessary to understanding the location and environmental conditions existent during collecting events. Despite their clear value for (and recent use in) global change studies, the text-mining outputs from field notebooks have been idiosyncratic to specific research projects, and impossible to discover or re-use. Best practices and workflows for digitization, transcription, extraction, and integration with other sources are nascent or non-existent. In this paper, we demonstrate a workflow to generate structured outputs while also maintaining links to the original texts. The first step in this workflow was to place already digitized and transcribed field notebooks from the University of Colorado Museum of Natural History founder, Junius Henderson, on Wikisource, an open text transcription platform. Next, we created Wikisource templates to document places, dates, and taxa to facilitate annotation and wiki-linking. We then requested help from the public, through social media tools, to take advantage of volunteer efforts and energy. After three notebooks were fully annotated, content was converted into XML and annotations were extracted and cross-walked into Darwin Core compliant record sets. Finally, these recordsets were vetted, to provide valid taxon names, via a process we call "taxonomic referencing." The result is identification and mobilization of 1,068 observations from three of Henderson's thirteen notebooks and a publishable Darwin Core record set for use in other analyses. Although challenges remain, this work demonstrates a feasible approach to unlock observations from field notebooks that enhances their discovery and interoperability without losing the narrative context from which those observations are drawn."Compose your notes as if you were writing a letter to someone a century in the future."Perrine and Patton (2011).

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...