RESUMEN
Science funders, publishers, and data archives make decisions about how to responsibly allocate resources to maximize the reuse potential of research data. This paper introduces a dataset developed to measure the impact of archival and data curation decisions on data reuse. The dataset describes 10,605 social science research datasets, their curation histories, and reuse contexts in 94,755 publications that cover 59 years from 1963 to 2022. The dataset was constructed from study-level metadata, citing publications, and curation records available through the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan. The dataset includes information about study-level attributes (e.g., PIs, funders, subject terms); usage statistics (e.g., downloads, citations); archiving decisions (e.g., curation activities, data transformations); and bibliometric attributes (e.g., journals, authors) for citing publications. This dataset provides information on factors that contribute to long-term data reuse, which can inform the design of effective evidence-based recommendations to support high-impact research data curation decisions.
RESUMEN
Large language models (LLMs) such as ChatGPT have recently attracted significant attention due to their impressive performance on many real-world tasks. These models have also demonstrated the potential in facilitating various biomedical tasks. However, little is known of their potential in biomedical information retrieval, especially identifying drug-disease associations. This study aims to explore the potential of ChatGPT, a popular LLM, in discerning drug-disease associations. We collected 2694 true drug-disease associations and 5662 false drug-disease pairs. Our approach involved creating various prompts to instruct ChatGPT in identifying these associations. Under varying prompt designs, ChatGPT's capability to identify drug-disease associations with an accuracy of 74.6-83.5% and 96.2-97.6% for the true and false pairs, respectively. This study shows that ChatGPT has the potential in identifying drug-disease associations and may serve as a helpful tool in searching pharmacy-related information. However, the accuracy of its insights warrants comprehensive examination before its implementation in medical practice.
RESUMEN
Despite large public investments in facilitating the secondary use of data, there is little information about the specific factors that predict data's reuse. Using data download logs from the Inter-university Consortium for Political and Social Research (ICPSR), this study examines how data properties, curation decisions, and repository funding models relate to data reuse. We find that datasets deposited by institutions, subject to many curatorial tasks, and whose access and preservation is funded externally, are used more often. Our findings confirm that investments in data collection, curation, and preservation are associated with more data reuse.
RESUMEN
Social media data offer a rich resource for researchers interested in public health, labor economics, politics, social behaviors, and other topics. However, scale and anonymity mean that researchers often cannot directly get permission from users to collect and analyze their social media data. This article applies the basic ethical principle of respect for persons to consider individuals' perceptions of acceptable uses of data. We compare individuals' perceptions of acceptable uses of other types of sensitive data, such as health records and individual identifiers, with their perceptions of acceptable uses of social media data. Our survey of 1018 people shows that individuals think of their social media data as moderately sensitive and agree that it should be protected. Respondents are generally okay with researchers using their data in social research but prefer that researchers clearly articulate benefits and seek explicit consent before conducting research. We argue that researchers must ensure that their research provides social benefits worthy of individual risks and that they must address those risks throughout the research process.
Asunto(s)
Medios de Comunicación Sociales , Humanos , Confidencialidad , Investigadores , Salud Pública , Encuestas y CuestionariosRESUMEN
The institutional review of interdisciplinary bodies of research lacks methods to systematically produce higher-level abstractions. Abstraction methods, like the "distant reading" of corpora, are increasingly important for knowledge discovery in the sciences and humanities. We demonstrate how abstraction methods complement the metrics on which research reviews currently rely. We model cross-disciplinary topics of research publications and projects emerging at multiple levels of detail in the context of an institutional review of the Earth Research Institute (ERI) at the University of California at Santa Barbara. From these, we design science maps that reveal the latent thematic structure of ERI's interdisciplinary research and enable reviewers to "read" a body of research at multiple levels of detail. We find that our approach provides decision support and reveals trends that strengthen the institutional review process by exposing regions of thematic expertise, distributions and clusters of work, and the evolution of these aspects.