Search | VHL Regional Portal

1.

The quality of data-driven hypotheses generated by inexperienced clinical researchers: A case study.

Ernst, Mytchell A; Draghi, Brooke N; Cimino, James J; Patel, Vimla L; Zhou, Yuchun; Shubrook, Jay H; De Lacalle, Sonsoles; Weaver, Aneesa; Liu, Chang; Jing, Xia.

medRxiv ; 2024 Aug 13.

Article in English | MEDLINE | ID: mdl-39185523

ABSTRACT

Objectives: We invited inexperienced clinical researchers to analyze coded health datasets and develop hypotheses. We recorded and analyzed their hypothesis generation process. All the hypotheses generated in the process were rated by the same group of seven experts by using the same metrics. This case study examines the higher quality (i.e., higher ratings) and lower quality of hypotheses and participants who generated them. We characterized the contextual factors associated with the quality of hypotheses. Methods: All participants (i.e., clinical researchers) completed a 2-hour study session to analyze data and generate scientific hypotheses using the think-aloud method. Participants' screen activity and audio were recorded and transcribed. These transcriptions were used to measure the time used to generate each hypothesis and to code cognitive events (i.e., cognitive activities used when generating hypotheses, for example, "Seeking for Connection" describes an attempt to draw connections between data points). The hypothesis ratings by the expert panel were used as the quality of the hypotheses during the analysis. We analyzed the factors associated with (1) the five highest and (2) five lowest rated hypotheses and (3) the participants who generated them, including the number of hypotheses per participant, the validity of those hypotheses, the number of cognitive events used for each hypothesis, as well as the participant's research experience and basic demographics. Results: Participants who generated the five highest-rated hypotheses used similar lengths of time (difference 3:03), whereas those who generated the five lowest-rated hypotheses used more varying lengths of time (difference 7:13). Participants who generated the five highest-rated hypotheses also utilized slightly fewer cognitive events on average compared to the five lowest-rated hypotheses (4 per hypothesis vs. 4.8 per hypothesis). When we examine the participants (who generated the five highest and five lowest hypotheses) and their total hypotheses generated during the 2-hour study sessions, the participants with the five highest-rated hypotheses again had a shorter range of time per hypothesis on average (0:03:34 vs. 0:07:17). They (with the five highest ratings) used fewer cognitive events per hypothesis (3.498 vs. 4.626). They (with the five highest ratings) also had a higher percentage of valid rate (75.51% vs. 63.63%) and generally had more experience with clinical research. Conclusion: The quality of the hypotheses was shown to be associated with the time taken to generate them, where too long or too short time to generate hypotheses appears to be negatively associated with the hypotheses' quality ratings. Also, having more experience seems to positively correlate with higher ratings of hypotheses and higher valid rates. Validity is a quality dimension used by the expert panel during rating. However, we acknowledge that our results are anecdotal. The effect may not be simply linear, and future research is necessary. These results underscore the multi-factor nature of hypothesis generation.

2.

Data-driven hypothesis generation among inexperienced clinical researchers: A comparison of secondary data analyses with visualization (VIADS) and other tools.

Jing, Xia; Cimino, James J; Patel, Vimla L; Zhou, Yuchun; Shubrook, Jay H; De Lacalle, Sonsoles; Draghi, Brooke N; Ernst, Mytchell A; Weaver, Aneesa; Sekar, Shriram; Liu, Chang.

J Clin Transl Sci ; 8(1): e13, 2024.

Article in English | MEDLINE | ID: mdl-38384898

ABSTRACT

Objectives: To compare how clinical researchers generate data-driven hypotheses with a visual interactive analytic tool (VIADS, a visual interactive analysis tool for filtering and summarizing large datasets coded with hierarchical terminologies) or other tools. Methods: We recruited clinical researchers and separated them into "experienced" and "inexperienced" groups. Participants were randomly assigned to a VIADS or control group within the groups. Each participant conducted a remote 2-hour study session for hypothesis generation with the same study facilitator on the same datasets by following a think-aloud protocol. Screen activities and audio were recorded, transcribed, coded, and analyzed. Hypotheses were evaluated by seven experts on their validity, significance, and feasibility. We conducted multilevel random effect modeling for statistical tests. Results: Eighteen participants generated 227 hypotheses, of which 147 (65%) were valid. The VIADS and control groups generated a similar number of hypotheses. The VIADS group took a significantly shorter time to generate one hypothesis (e.g., among inexperienced clinical researchers, 258 s versus 379 s, p = 0.046, power = 0.437, ICC = 0.15). The VIADS group received significantly lower ratings than the control group on feasibility and the combination rating of validity, significance, and feasibility. Conclusion: The role of VIADS in hypothesis generation seems inconclusive. The VIADS group took a significantly shorter time to generate each hypothesis. However, the combined validity, significance, and feasibility ratings of their hypotheses were significantly lower. Further characterization of hypotheses, including specifics on how they might be improved, could guide future tool development.

3.

How do clinical researchers generate data-driven scientific hypotheses? Cognitive events using think-aloud protocol.

Jing, Xia; Draghi, Brooke N; Ernst, Mytchell A; Patel, Vimla L; Cimino, James J; Shubrook, Jay H; Zhou, Yuchun; Liu, Chang; De Lacalle, Sonsoles.

medRxiv ; 2023 Oct 31.

Article in English | MEDLINE | ID: mdl-37961555

ABSTRACT

Objectives: This study aims to identify the cognitive events related to information use (e.g., "Analyze data", "Seek connection") during hypothesis generation among clinical researchers. Specifically, we describe hypothesis generation using cognitive event counts and compare them between groups. Methods: The participants used the same datasets, followed the same scripts, used VIADS (a visual interactive analysis tool for filtering and summarizing large data sets coded with hierarchical terminologies) or other analytical tools (as control) to analyze the datasets, and came up with hypotheses while following the think-aloud protocol. Their screen activities and audio were recorded and then transcribed and coded for cognitive events. Results: The VIADS group exhibited the lowest mean number of cognitive events per hypothesis and the smallest standard deviation. The experienced clinical researchers had approximately 10% more valid hypotheses than the inexperienced group. The VIADS users among the inexperienced clinical researchers exhibit a similar trend as the experienced clinical researchers in terms of the number of cognitive events and their respective percentages out of all the cognitive events. The highest percentages of cognitive events in hypothesis generation were "Using analysis results" (30%) and "Seeking connections" (23%). Conclusion: VIADS helped inexperienced clinical researchers use fewer cognitive events to generate hypotheses than the control group. This suggests that VIADS may guide participants to be more structured during hypothesis generation compared with the control group. The results provide evidence to explain the shorter average time needed by the VIADS group in generating each hypothesis.

4.

Data-driven hypothesis generation among inexperienced clinical researchers: A comparison of secondary data analyses with visualization (VIADS) and other tools.

Jing, Xia; Cimino, James J; Patel, Vimla L; Zhou, Yuchun; Shubrook, Jay H; De Lacalle, Sonsoles; Draghi, Brooke N; Ernst, Mytchell A; Weaver, Aneesa; Sekar, Shriram; Liu, Chang.

medRxiv ; 2023 Oct 31.

Article in English | MEDLINE | ID: mdl-37333271

ABSTRACT

Objectives: To compare how clinical researchers generate data-driven hypotheses with a visual interactive analytic tool (VIADS, a visual interactive analysis tool for filtering and summarizing large data sets coded with hierarchical terminologies) or other tools. Methods: We recruited clinical researchers and separated them into "experienced" and "inexperienced" groups. Participants were randomly assigned to a VIADS or control group within the groups. Each participant conducted a remote 2-hour study session for hypothesis generation with the same study facilitator on the same datasets by following a think-aloud protocol. Screen activities and audio were recorded, transcribed, coded, and analyzed. Hypotheses were evaluated by seven experts on their validity, significance, and feasibility. We conducted multilevel random effect modeling for statistical tests. Results: Eighteen participants generated 227 hypotheses, of which 147 (65%) were valid. The VIADS and control groups generated a similar number of hypotheses. The VIADS group took a significantly shorter time to generate one hypothesis (e.g., among inexperienced clinical researchers, 258 seconds versus 379 seconds, p = 0.046, power = 0.437, ICC = 0.15). The VIADS group received significantly lower ratings than the control group on feasibility and the combination rating of validity, significance, and feasibility. Conclusion: The role of VIADS in hypothesis generation seems inconclusive. The VIADS group took a significantly shorter time to generate each hypothesis. However, the combined validity, significance, and feasibility ratings of their hypotheses were significantly lower. Further characterization of hypotheses, including specifics on how they might be improved, could guide future tool development.

5.

A Visual Analytic Tool (VIADS) to Assist the Hypothesis Generation Process in Clinical Research: Mixed Methods Usability Study.

Jing, Xia; Patel, Vimla L; Cimino, James J; Shubrook, Jay H; Zhou, Yuchun; Draghi, Brooke N; Ernst, Mytchell A; Liu, Chang; De Lacalle, Sonsoles.

JMIR Hum Factors ; 10: e44644, 2023 Apr 27.

Article in English | MEDLINE | ID: mdl-37011112

ABSTRACT

BACKGROUND: Visualization can be a powerful tool to comprehend data sets, especially when they can be represented via hierarchical structures. Enhanced comprehension can facilitate the development of scientific hypotheses. However, the inclusion of excessive data can make visualizations overwhelming. OBJECTIVE: We developed a visual interactive analytic tool for filtering and summarizing large health data sets coded with hierarchical terminologies (VIADS). In this study, we evaluated the usability of VIADS for visualizing data sets of patient diagnoses and procedures coded in the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). METHODS: We used mixed methods in the study. A group of 12 clinical researchers participated in the generation of data-driven hypotheses using the same data sets and time frame (a 1-hour training session and a 2-hour study session) utilizing VIADS via the think-aloud protocol. The audio and screen activities were recorded remotely. A modified version of the System Usability Scale (SUS) survey and a brief survey with open-ended questions were administered after the study to assess the usability of VIADS and verify their intense usage experience with VIADS. RESULTS: The range of SUS scores was 37.5 to 87.5. The mean SUS score for VIADS was 71.88 (out of a possible 100, SD 14.62), and the median SUS was 75. The participants unanimously agreed that VIADS offers new perspectives on data sets (12/12, 100%), while 75% (8/12) agreed that VIADS facilitates understanding, presentation, and interpretation of underlying data sets. The comments on the utility of VIADS were positive and aligned well with the design objectives of VIADS. The answers to the open-ended questions in the modified SUS provided specific suggestions regarding potential improvements for VIADS, and the identified problems with usability were used to update the tool. CONCLUSIONS: This usability study demonstrates that VIADS is a usable tool for analyzing secondary data sets with good average usability, good SUS score, and favorable utility. Currently, VIADS accepts data sets with hierarchical codes and their corresponding frequencies. Consequently, only specific types of use cases are supported by the analytical results. Participants agreed, however, that VIADS provides new perspectives on data sets and is relatively easy to use. The VIADS functionalities most appreciated by participants were the ability to filter, summarize, compare, and visualize data. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.2196/39414.

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL