Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
1.
JMIR Med Inform ; 12: e57164, 2024 Jun 21.
Article in English | MEDLINE | ID: mdl-38904984

ABSTRACT

BACKGROUND: Vaccines serve as a crucial public health tool, although vaccine hesitancy continues to pose a significant threat to full vaccine uptake and, consequently, community health. Understanding and tracking vaccine hesitancy is essential for effective public health interventions; however, traditional survey methods present various limitations. OBJECTIVE: This study aimed to create a real-time, natural language processing (NLP)-based tool to assess vaccine sentiment and hesitancy across 3 prominent social media platforms. METHODS: We mined and curated discussions in English from Twitter (subsequently rebranded as X), Reddit, and YouTube social media platforms posted between January 1, 2011, and October 31, 2021, concerning human papillomavirus; measles, mumps, and rubella; and unspecified vaccines. We tested multiple NLP algorithms to classify vaccine sentiment into positive, neutral, or negative and to classify vaccine hesitancy using the World Health Organization's (WHO) 3Cs (confidence, complacency, and convenience) hesitancy model, conceptualizing an online dashboard to illustrate and contextualize trends. RESULTS: We compiled over 86 million discussions. Our top-performing NLP models displayed accuracies ranging from 0.51 to 0.78 for sentiment classification and from 0.69 to 0.91 for hesitancy classification. Explorative analysis on our platform highlighted variations in online activity about vaccine sentiment and hesitancy, suggesting unique patterns for different vaccines. CONCLUSIONS: Our innovative system performs real-time analysis of sentiment and hesitancy on 3 vaccine topics across major social networks, providing crucial trend insights to assist campaigns aimed at enhancing vaccine uptake and public health.

2.
BMC Med Res Methodol ; 24(1): 108, 2024 May 09.
Article in English | MEDLINE | ID: mdl-38724903

ABSTRACT

OBJECTIVE: Systematic literature reviews (SLRs) are critical for life-science research. However, the manual selection and retrieval of relevant publications can be a time-consuming process. This study aims to (1) develop two disease-specific annotated corpora, one for human papillomavirus (HPV) associated diseases and the other for pneumococcal-associated pediatric diseases (PAPD), and (2) optimize machine- and deep-learning models to facilitate automation of the SLR abstract screening. METHODS: This study constructed two disease-specific SLR screening corpora for HPV and PAPD, which contained citation metadata and corresponding abstracts. Performance was evaluated using precision, recall, accuracy, and F1-score of multiple combinations of machine- and deep-learning algorithms and features such as keywords and MeSH terms. RESULTS AND CONCLUSIONS: The HPV corpus contained 1697 entries, with 538 relevant and 1159 irrelevant articles. The PAPD corpus included 2865 entries, with 711 relevant and 2154 irrelevant articles. Adding additional features beyond title and abstract improved the performance (measured in Accuracy) of machine learning models by 3% for HPV corpus and 2% for PAPD corpus. Transformer-based deep learning models that consistently outperformed conventional machine learning algorithms, highlighting the strength of domain-specific pre-trained language models for SLR abstract screening. This study provides a foundation for the development of more intelligent SLR systems.


Subject(s)
Machine Learning , Papillomavirus Infections , Humans , Papillomavirus Infections/diagnosis , Economics, Medical , Algorithms , Outcome Assessment, Health Care/methods , Deep Learning , Abstracting and Indexing/methods
3.
medRxiv ; 2024 May 06.
Article in English | MEDLINE | ID: mdl-38633810

ABSTRACT

Background: Large language models (LLMs) have shown promising performance in various healthcare domains, but their effectiveness in identifying specific clinical conditions in real medical records is less explored. This study evaluates LLMs for detecting signs of cognitive decline in real electronic health record (EHR) clinical notes, comparing their error profiles with traditional models. The insights gained will inform strategies for performance enhancement. Methods: This study, conducted at Mass General Brigham in Boston, MA, analyzed clinical notes from the four years prior to a 2019 diagnosis of mild cognitive impairment in patients aged 50 and older. We used a randomly annotated sample of 4,949 note sections, filtered with keywords related to cognitive functions, for model development. For testing, a random annotated sample of 1,996 note sections without keyword filtering was utilized. We developed prompts for two LLMs, Llama 2 and GPT-4, on HIPAA-compliant cloud-computing platforms using multiple approaches (e.g., both hard and soft prompting and error analysis-based instructions) to select the optimal LLM-based method. Baseline models included a hierarchical attention-based neural network and XGBoost. Subsequently, we constructed an ensemble of the three models using a majority vote approach. Results: GPT-4 demonstrated superior accuracy and efficiency compared to Llama 2, but did not outperform traditional models. The ensemble model outperformed the individual models, achieving a precision of 90.3%, a recall of 94.2%, and an F1-score of 92.2%. Notably, the ensemble model showed a significant improvement in precision, increasing from a range of 70%-79% to above 90%, compared to the best-performing single model. Error analysis revealed that 63 samples were incorrectly predicted by at least one model; however, only 2 cases (3.2%) were mutual errors across all models, indicating diverse error profiles among them. Conclusions: LLMs and traditional machine learning models trained using local EHR data exhibited diverse error profiles. The ensemble of these models was found to be complementary, enhancing diagnostic performance. Future research should investigate integrating LLMs with smaller, localized models and incorporating medical data and domain knowledge to enhance performance on specific tasks.

4.
J Am Med Inform Assoc ; 31(2): 375-385, 2024 Jan 18.
Article in English | MEDLINE | ID: mdl-37952206

ABSTRACT

OBJECTIVES: We aim to build a generalizable information extraction system leveraging large language models to extract granular eligibility criteria information for diverse diseases from free text clinical trial protocol documents. We investigate the model's capability to extract criteria entities along with contextual attributes including values, temporality, and modifiers and present the strengths and limitations of this system. MATERIALS AND METHODS: The clinical trial data were acquired from https://ClinicalTrials.gov/. We developed a system, AutoCriteria, which comprises the following modules: preprocessing, knowledge ingestion, prompt modeling based on GPT, postprocessing, and interim evaluation. The final system evaluation was performed, both quantitatively and qualitatively, on 180 manually annotated trials encompassing 9 diseases. RESULTS: AutoCriteria achieves an overall F1 score of 89.42 across all 9 diseases in extracting the criteria entities, with the highest being 95.44 for nonalcoholic steatohepatitis and the lowest of 84.10 for breast cancer. Its overall accuracy is 78.95% in identifying all contextual information across all diseases. Our thematic analysis indicated accurate logic interpretation of criteria as one of the strengths and overlooking/neglecting the main criteria as one of the weaknesses of AutoCriteria. DISCUSSION: AutoCriteria demonstrates strong potential to extract granular eligibility criteria information from trial documents without requiring manual annotations. The prompts developed for AutoCriteria generalize well across different disease areas. Our evaluation suggests that the system handles complex scenarios including multiple arm conditions and logics. CONCLUSION: AutoCriteria currently encompasses a diverse range of diseases and holds potential to extend to more in the future. This signifies a generalizable and scalable solution, poised to address the complexities of clinical trial application in real-world settings.


Subject(s)
Breast Neoplasms , Natural Language Processing , Humans , Female , Information Storage and Retrieval , Breast Neoplasms/drug therapy , Language , Eligibility Determination/methods
5.
BMC Bioinformatics ; 24(Suppl 3): 477, 2023 Dec 15.
Article in English | MEDLINE | ID: mdl-38102593

ABSTRACT

BACKGROUND: With more clinical trials are offering optional participation in the collection of bio-specimens for biobanking comes the increasing complexity of requirements of informed consent forms. The aim of this study is to develop an automatic natural language processing (NLP) tool to annotate informed consent documents to promote biorepository data regulation, sharing, and decision support. We collected informed consent documents from several publicly available sources, then manually annotated them, covering sentences containing permission information about the sharing of either bio-specimens or donor data, or conducting genetic research or future research using bio-specimens or donor data. RESULTS: We evaluated a variety of machine learning algorithms including random forest (RF) and support vector machine (SVM) for the automatic identification of these sentences. 120 informed consent documents containing 29,204 sentences were annotated, of which 1250 sentences (4.28%) provide answers to a permission question. A support vector machine (SVM) model achieved a F-1 score of 0.95 on classifying the sentences when using a gold standard, which is a prefiltered corpus containing all relevant sentences. CONCLUSIONS: This study provides the feasibility of using machine learning tools to classify permission-related sentences in informed consent documents.


Subject(s)
Biological Specimen Banks , Consent Forms , Machine Learning , Algorithms , Natural Language Processing
6.
J Am Med Inform Assoc ; 28(6): 1275-1283, 2021 06 12.
Article in English | MEDLINE | ID: mdl-33674830

ABSTRACT

The COVID-19 pandemic swept across the world rapidly, infecting millions of people. An efficient tool that can accurately recognize important clinical concepts of COVID-19 from free text in electronic health records (EHRs) will be valuable to accelerate COVID-19 clinical research. To this end, this study aims at adapting the existing CLAMP natural language processing tool to quickly build COVID-19 SignSym, which can extract COVID-19 signs/symptoms and their 8 attributes (body location, severity, temporal expression, subject, condition, uncertainty, negation, and course) from clinical text. The extracted information is also mapped to standard concepts in the Observational Medical Outcomes Partnership common data model. A hybrid approach of combining deep learning-based models, curated lexicons, and pattern-based rules was applied to quickly build the COVID-19 SignSym from CLAMP, with optimized performance. Our extensive evaluation using 3 external sites with clinical notes of COVID-19 patients, as well as the online medical dialogues of COVID-19, shows COVID-19 SignSym can achieve high performance across data sources. The workflow used for this study can be generalized to other use cases, where existing clinical natural language processing tools need to be customized for specific information needs within a short time. COVID-19 SignSym is freely accessible to the research community as a downloadable package (https://clamp.uth.edu/covid/nlp.php) and has been used by 16 healthcare organizations to support clinical research of COVID-19.


Subject(s)
COVID-19/diagnosis , Electronic Health Records , Information Storage and Retrieval/methods , Natural Language Processing , Deep Learning , Humans , Symptom Assessment/methods
7.
AMIA Annu Symp Proc ; 2021: 197-206, 2021.
Article in English | MEDLINE | ID: mdl-35309008

ABSTRACT

The informed consent process is a complicated procedure involving permissions as well a variety of entities and actions. In this paper, we discuss the use of Semantic Web Rule Language (SWRL) to further extend the Informed Consent Ontology (ICO) to allow for semantic machine-based reasoning to manage and generate important permission-based information that can later be viewed by stakeholders. We present four use cases of permissions from the All of Us informed consent document and translate these permissions into SWRL expressions to extend and operationalize ICO. Our efforts show how SWRL is able to infer some of the implicit information based on the defined rules, and demonstrate the utility of ICO through the use of SWRL extensions. Future work will include developing formal and generalized rules and expressing permissions from the entire document, as well as working towards integrating ICO into software systems to enhance the semantic representation of informed consent for biomedical research.


Subject(s)
Population Health , Semantic Web , Humans , Informed Consent , Language , Semantics
8.
ArXiv ; 2020 Jul 13.
Article in English | MEDLINE | ID: mdl-32908948

ABSTRACT

The COVID-19 pandemic swept across the world rapidly, infecting millions of people. An efficient tool that can accurately recognize important clinical concepts of COVID-19 from free text in electronic health records (EHRs) will be valuable to accelerate COVID-19 clinical research. To this end, this study aims at adapting the existing CLAMP natural language processing tool to quickly build COVID-19 SignSym, which can extract COVID-19 signs/symptoms and their 8 attributes (body location, severity, temporal expression, subject, condition, uncertainty, negation, and course) from clinical text. The extracted information is also mapped to standard concepts in the Observational Medical Outcomes Partnership common data model. A hybrid approach of combining deep learning-based models, curated lexicons, and pattern-based rules was applied to quickly build the COVID-19 SignSym from CLAMP, with optimized performance. Our extensive evaluation using 3 external sites with clinical notes of COVID-19 patients, as well as the online medical dialogues of COVID-19, shows COVID-19 SignSym can achieve high performance across data sources. The workflow used for this study can be generalized to other use cases, where existing clinical natural language processing tools need to be customized for specific information needs within a short time. COVID-19 SignSym is freely accessible to the research community as a downloadable package (https://clamp.uth.edu/covid/nlp.php) and has been used by 16 healthcare organizations to support clinical research of COVID-19.

9.
J Natl Compr Canc Netw ; 17(12): 1505-1511, 2019 12.
Article in English | MEDLINE | ID: mdl-31805530

ABSTRACT

BACKGROUND: Objective radiographic assessment is crucial for accurately evaluating therapeutic efficacy and patient outcomes in oncology clinical trials. Imaging assessment workflow can be complex; can vary with institution; may burden medical oncologists, who are often inadequately trained in radiology and response criteria; and can lead to high interobserver variability and investigator bias. This article reviews the development of a tumor response assessment core (TRAC) at a comprehensive cancer center with the goal of providing standardized, objective, unbiased tumor imaging assessments, and highlights the web-based platform and overall workflow. In addition, quantitative response assessments by the medical oncologists, radiologist, and TRAC are compared in a retrospective cohort of patients to determine concordance. PATIENTS AND METHODS: The TRAC workflow includes an image analyst who pre-reviews scans before review with a board-certified radiologist and then manually uploads annotated data on the proprietary TRAC web portal. Patients previously enrolled in 10 lung cancer clinical trials between January 2005 and December 2015 were identified, and the prospectively collected quantitative response assessments by the medical oncologists were compared with retrospective analysis of the same dataset by a radiologist and TRAC. RESULTS: This study enlisted 49 consecutive patients (53% female) with a median age of 60 years (range, 29-78 years); 2 patients did not meet study criteria and were excluded. A linearly weighted kappa test for concordance for TRAC versus radiologist was substantial at 0.65 (95% CI, 0.46-0.85; standard error [SE], 0.10). The kappa value was moderate at 0.42 (95% CI, 0.20-0.64; SE, 0.11) for TRAC versus oncologists and only fair at 0.34 (95% CI, 0.12-0.55; SE, 0.11) for oncologists versus radiologist. CONCLUSIONS: Medical oncologists burdened with the task of tumor measurements in patients on clinical trials may introduce significant variability and investigator bias, with the potential to affect therapeutic response and clinical trial outcomes. Institutional imaging cores may help bridge the gap by providing unbiased and reproducible measurements and enable a leaner workflow.


Subject(s)
Clinical Trials as Topic/standards , Image Interpretation, Computer-Assisted/methods , Multimodal Imaging/methods , Neoplasms/pathology , Observer Variation , Oncologists/statistics & numerical data , Response Evaluation Criteria in Solid Tumors , Adult , Aged , Combined Modality Therapy , Female , Follow-Up Studies , Humans , Male , Middle Aged , Neoplasms/diagnostic imaging , Neoplasms/therapy , Prognosis , Prospective Studies , Retrospective Studies
10.
Stud Health Technol Inform ; 245: 838-842, 2017.
Article in English | MEDLINE | ID: mdl-29295217

ABSTRACT

We report on a study of our custom Hootation software for the purposes of assessing its ability to produce clear and accurate natural language phrases from axioms embedded in three biomedical ontologies. Using multiple domain experts and three discrete rating scales, we evaluated the tool on clarity of the natural language produced, fidelity of the natural language produced from the ontology to the axiom, and the fidelity of the domain knowledge represented by the axioms. Results show that Hootation provided relatively clear natural language equivalents for a select set of OWL axioms, although the clarity of statements hinges on the accuracy and representation of axioms in the ontology.


Subject(s)
Biological Ontologies , Natural Language Processing , Knowledge , Language , Software
12.
Nat Commun ; 6: 7138, 2015 Jul 07.
Article in English | MEDLINE | ID: mdl-26151821

ABSTRACT

Genetic susceptibility to colorectal cancer is caused by rare pathogenic mutations and common genetic variants that contribute to familial risk. Here we report the results of a two-stage association study with 18,299 cases of colorectal cancer and 19,656 controls, with follow-up of the most statistically significant genetic loci in 4,725 cases and 9,969 controls from two Asian consortia. We describe six new susceptibility loci reaching a genome-wide threshold of P<5.0E-08. These findings provide additional insight into the underlying biological mechanisms of colorectal cancer and demonstrate the scientific value of large consortia-based genetic epidemiology studies.


Subject(s)
Colorectal Neoplasms/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study , Case-Control Studies , Humans , Odds Ratio , Polymorphism, Single Nucleotide
13.
Curr Oncol Rep ; 14(6): 494-501, 2012 Dec.
Article in English | MEDLINE | ID: mdl-22948276

ABSTRACT

Along with the increasing adoption of electronic health records (EHRs) are expectations that data collected within EHRs will be readily available for outcomes and comparative effectiveness research. Yet the ability to effectively share and reuse data depends on implementing and configuring EHRs with these goals in mind from the beginning. Data sharing and integration must be planned both locally as well as nationally. The rich data transmission and semantic infrastructure developed by the National Cancer Institute (NCI) for research provides an excellent example of moving beyond paper-based paradigms and exploiting the power of semantically robust, network-based systems, and engaging both domain and informatics expertise. Similar efforts are required to address current challenges in sharing EHR data.


Subject(s)
Electronic Health Records , Information Dissemination , Medical Records Systems, Computerized , Humans , Medical Informatics , Semantics
14.
AMIA Annu Symp Proc ; 2012: 321-30, 2012.
Article in English | MEDLINE | ID: mdl-23304302

ABSTRACT

In this study, we quantified the use of uncertainty expressions, referred to as 'hedge' phrases, among a corpus of 100,000 clinical documents retrieved from our institution's electronic health record system. The frequency of each hedge phrase appearing in the corpus was characterized across document types and clinical departments. We also used a natural language processing tool to identify clinical concepts that were spatially, and potentially semantically, associated with the hedge phrases identified. The objective was to delineate the prevalence of hedge phrase usage in clinical documentation which may have a profound impact on patient care and provider-patient communication, and may become a source of unintended consequences when such documents are made directly accessible to patients via patient portals.


Subject(s)
Electronic Health Records , Language , Natural Language Processing , Patient Access to Records , Humans , Medical Records Systems, Computerized , Physicians
15.
AMIA Annu Symp Proc ; 2011: 1630-8, 2011.
Article in English | MEDLINE | ID: mdl-22195229

ABSTRACT

In this study, we comparatively examined the linguistic properties of narrative clinician notes created through voice dictation versus those directly entered by clinicians via a computer keyboard. Intuitively, the nature of voice-dictated notes would resemble that of natural language, while typed-in notes may demonstrate distinctive language features for reasons such as intensive usage of acronyms. The study analyses were based on an empirical dataset retrieved from our institutional electronic health records system. The dataset contains 30,000 voice-dictated notes and 30,000 notes that were entered manually; both were encounter notes generated in ambulatory care settings. The results suggest that between the narrative clinician notes created via these two different methods, there exists a considerable amount of lexical and distributional differences. Such differences could have a significant impact on the performance of natural language processing tools, necessitating these two different types of documents being differentially treated.


Subject(s)
Computer Peripherals , Electronic Health Records , Linguistics , Medical Records , Natural Language Processing , Speech Recognition Software , Humans , Narration , User-Computer Interface
16.
J Telemed Telecare ; 17(1): 36-40, 2011.
Article in English | MEDLINE | ID: mdl-21097566

ABSTRACT

We examined the feasibility of home videoconferencing for providing cancer genetic education and risk information to people at risk. Adults with possible hereditary colon or breast and ovarian cancer syndromes were offered Internet-based counselling. Participants were sent web cameras and software to install on their home PCs. They watched a prerecorded educational video and then took part in a live counselling session with a genetic counsellor. A total of 31 participants took part in Internet counselling sessions. Satisfaction with counselling was high in all domains studied, including technical (mean 4.3 on a 1-5 scale), education (mean 4.7), communication (mean 4.8), psychosocial (mean 4.1) and overall (mean 4.2). Qualitative data identified technical aspects that could be improved. All participants reported that they would recommend Internet-based counselling to others. Internet-based genetic counselling is feasible and associated with a high level of satisfaction among participants.


Subject(s)
Genetic Counseling/standards , Internet/standards , Neoplasms/genetics , Patient Education as Topic/standards , Videoconferencing/standards , Adult , Aged , Breast Neoplasms/genetics , Colonic Neoplasms/genetics , Feasibility Studies , Female , Genetic Counseling/methods , Humans , Male , Middle Aged , Ovarian Neoplasms/genetics , Patient Education as Topic/methods , Patient Satisfaction , Surveys and Questionnaires
17.
J Biomed Inform ; 42(6): 1035-45, 2009 Dec.
Article in English | MEDLINE | ID: mdl-19497389

ABSTRACT

It is increasingly important for investigators to efficiently and effectively access, interpret, and analyze the data from diverse biological, literature, and annotation sources in a unified way. The heterogeneity of biomedical data and the lack of metadata are the primary sources of the difficulty for integration, presenting major challenges to effective search and retrieval of the information. As a proof of concept, the Prostate Cancer Ontology (PCO) is created for the development of the Prostate Cancer Information System (PCIS). PCIS is applied to demonstrate how the ontology is utilized to solve the semantic heterogeneity problem from the integration of two prostate cancer related database systems at the Fox Chase Cancer Center. As the results of the integration process, the semantic query language SPARQL is applied to perform the integrated queries across the two database systems based on PCO.


Subject(s)
Computational Biology/methods , Database Management Systems , Information Storage and Retrieval/methods , Prostatic Neoplasms , Terminology as Topic , Databases, Factual , Humans , Male , Semantics , User-Computer Interface
18.
BMC Bioinformatics ; 10: 184, 2009 Jun 16.
Article in English | MEDLINE | ID: mdl-19531228

ABSTRACT

BACKGROUND: Flow cytometry technology is widely used in both health care and research. The rapid expansion of flow cytometry applications has outpaced the development of data storage and analysis tools. Collaborative efforts being taken to eliminate this gap include building common vocabularies and ontologies, designing generic data models, and defining data exchange formats. The Minimum Information about a Flow Cytometry Experiment (MIFlowCyt) standard was recently adopted by the International Society for Advancement of Cytometry. This standard guides researchers on the information that should be included in peer reviewed publications, but it is insufficient for data exchange and integration between computational systems. The Functional Genomics Experiment (FuGE) formalizes common aspects of comprehensive and high throughput experiments across different biological technologies. We have extended FuGE object model to accommodate flow cytometry data and metadata. METHODS: We used the MagicDraw modelling tool to design a UML model (Flow-OM) according to the FuGE extension guidelines and the AndroMDA toolkit to transform the model to a markup language (Flow-ML). We mapped each MIFlowCyt term to either an existing FuGE class or to a new FuGEFlow class. The development environment was validated by comparing the official FuGE XSD to the schema we generated from the FuGE object model using our configuration. After the Flow-OM model was completed, the final version of the Flow-ML was generated and validated against an example MIFlowCyt compliant experiment description. RESULTS: The extension of FuGE for flow cytometry has resulted in a generic FuGE-compliant data model (FuGEFlow), which accommodates and links together all information required by MIFlowCyt. The FuGEFlow model can be used to build software and databases using FuGE software toolkits to facilitate automated exchange and manipulation of potentially large flow cytometry experimental data sets. Additional project documentation, including reusable design patterns and a guide for setting up a development environment, was contributed back to the FuGE project. CONCLUSION: We have shown that an extension of FuGE can be used to transform minimum information requirements in natural language to markup language in XML. Extending FuGE required significant effort, but in our experiences the benefits outweighed the costs. The FuGEFlow is expected to play a central role in describing flow cytometry experiments and ultimately facilitating data exchange including public flow cytometry repositories currently under development.


Subject(s)
Computational Biology/methods , Flow Cytometry , Information Storage and Retrieval/methods , Programming Languages , Databases, Factual
19.
BMC Med Inform Decis Mak ; 9: 31, 2009 Jun 15.
Article in English | MEDLINE | ID: mdl-19527521

ABSTRACT

BACKGROUND: Data protection is important for all information systems that deal with human-subjects data. Grid-based systems--such as the cancer Biomedical Informatics Grid (caBIG)--seek to develop new mechanisms to facilitate real-time federation of cancer-relevant data sources, including sources protected under a variety of regulatory laws, such as HIPAA and 21CFR11. These systems embody new models for data sharing, and hence pose new challenges to the regulatory community, and to those who would develop or adopt them. These challenges must be understood by both systems developers and system adopters. In this paper, we describe our work collecting policy statements, expectations, and requirements from regulatory decision makers at academic cancer centers in the United States. We use these statements to examine fundamental assumptions regarding data sharing using data federations and grid computing. METHODS: An interview-based study of key stakeholders from a sample of US cancer centers. Interviews were structured, and used an instrument that was developed for the purpose of this study. The instrument included a set of problem scenarios--difficult policy situations that were derived during a full-day discussion of potentially problematic issues by a set of project participants with diverse expertise. Each problem scenario included a set of open-ended questions that were designed to elucidate stakeholder opinions and concerns. Interviews were transcribed verbatim and used for both qualitative and quantitative analysis. For quantitative analysis, data was aggregated at the individual or institutional unit of analysis, depending on the specific interview question. RESULTS: Thirty-one (31) individuals at six cancer centers were contacted to participate. Twenty-four out of thirty-one (24/31) individuals responded to our request- yielding a total response rate of 77%. Respondents included IRB directors and policy-makers, privacy and security officers, directors of offices of research, information security officers and university legal counsel. Nineteen total interviews were conducted over a period of 16 weeks. Respondents provided answers for all four scenarios (a total of 87 questions). Results were grouped by broad themes, including among others: governance, legal and financial issues, partnership agreements, de-identification, institutional technical infrastructure for security and privacy protection, training, risk management, auditing, IRB issues, and patient/subject consent. CONCLUSION: The findings suggest that with additional work, large scale federated sharing of data within a regulated environment is possible. A key challenge is developing suitable models for authentication and authorization practices within a federated environment. Authentication--the recognition and validation of a person's identity--is in fact a global property of such systems, while authorization - the permission to access data or resources--mimics data sharing agreements in being best served at a local level. Nine specific recommendations result from the work and are discussed in detail. These include: (1) the necessity to construct separate legal or corporate entities for governance of federated sharing initiatives on this scale; (2) consensus on the treatment of foreign and commercial partnerships; (3) the development of risk models and risk management processes; (4) development of technical infrastructure to support the credentialing process associated with research including human subjects; (5) exploring the feasibility of developing large-scale, federated honest broker approaches; (6) the development of suitable, federated identity provisioning processes to support federated authentication and authorization; (7) community development of requisite HIPAA and research ethics training modules by federation members; (8) the recognition of the need for central auditing requirements and authority, and; (9) use of two-protocol data exchange models where possible in the federation.


Subject(s)
Biomedical Research , Computer Security/standards , Confidentiality/standards , Medical Oncology , Computer Communication Networks , Computer Security/legislation & jurisprudence , Confidentiality/legislation & jurisprudence , Databases, Factual/legislation & jurisprudence , Databases, Factual/standards , Decision Making, Organizational , Governing Board , Government Regulation , Health Insurance Portability and Accountability Act , Humans , Intellectual Property , Interviews as Topic , Organizational Policy , United States
20.
J Biomed Inform ; 39(4): 379-88, 2006 Aug.
Article in English | MEDLINE | ID: mdl-16298556

ABSTRACT

Gene expression microarrays monitor the expression levels of thousands of genes in an experiment simultaneously. To utilize the information generated, each of the thousands of spots on a microarray image must be properly quantified, including background correction. Most present methods require manual alignment of grids to the image data, and still often require additional minor adjustments on a spot by spot basis to correct for spotting irregularities. Such intervention is time consuming and also introduces inconsistency in the handling of data. A fully automatic, tested system would increase throughput and reliability in this field. In this paper, we describe WaveRead, a fully automated, standalone, open-source system for quantifying gene expression array images. Through the use of wavelet analysis to identify the spot locations and diameters, the system is able to automatically grid the image and quantify signal intensities and background corrections without any user intervention. The ability of WaveRead to perform proper quantification is demonstrated by analysis of both simulated images containing spots with donut shapes, elliptical shapes, and Gaussian intensity distributions, as well as of standard images from the National Cancer Institute.


Subject(s)
Algorithms , Artificial Intelligence , Gene Expression Profiling/methods , Gene Expression/physiology , Oligonucleotide Array Sequence Analysis/methods , Pattern Recognition, Automated/methods , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...