Pesquisa | Portal Regional da BVS

1.

Gender parity and homophily in the Drug and Alcohol Dependence editorial process.

Schick, Melissa R; Tomko, Rachel L; Maralit, Anna M; Afzal, Zubair; Squeglia, Lindsay M; Freda, Agnieszka; Porrino, Linda; Dahne, Jennifer; McClure, Erin A; Strain, Eric C.

Drug Alcohol Depend ; 236: 109493, 2022 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-35605531

RESUMO

BACKGROUND: Despite efforts towards gender parity and some improvement over time, gender bias in peer review remains a pervasive issue. We examined gender representation and homophily in the peer review process for Drug and Alcohol Dependence (DAD). METHODS: We extracted data for papers submitted to DAD between 2004 and 2019, inclusive. Inferred gender was assigned to handling editors and reviewers using the NamSor gender inference Application Programming Interface (API). RESULTS: Men and women handling editors were approximately equally likely to invite women reviewers over time, with only a few exceptions. Over time, 47.1% of editors were women, and 42.6% of review invitations were sent to women. Men were largely consistent over time in their likelihood of accepting a review invitation, while the likelihood of women accepting a review invitation was more variable over time. Gender differences in rates of accepting a review invitation were minimal; however, as women approached half of all invited reviewers in recent years, there has been a greater trend for women, relative to men, to decline review invitations. Evidence of homophily on the part of reviewers accepting invitations was minimal, but in certain years, a tendency to accept review invitations at higher rates from editors of the same gender was observed. DISCUSSION: Given the benefits of diversity in scientific advancement, these results underline the importance of continuing efforts to increase gender diversity among editors and in reviewer pools, and the need for reviewers to be mindful of their own reviewing practices.

Assuntos

Alcoolismo , Alcoolismo/epidemiologia , Feminino , Humanos , Masculino , Revisão por Pares , Sexismo

2.

ChEMU 2020: Natural Language Processing Methods Are Effective for Information Extraction From Chemical Patents.

He, Jiayuan; Nguyen, Dat Quoc; Akhondi, Saber A; Druckenbrodt, Christian; Thorne, Camilo; Hoessel, Ralph; Afzal, Zubair; Zhai, Zenan; Fang, Biaoyan; Yoshikawa, Hiyori; Albahem, Ameer; Cavedon, Lawrence; Cohn, Trevor; Baldwin, Timothy; Verspoor, Karin.

Front Res Metr Anal ; 6: 654438, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33870071

RESUMO

Chemical patents represent a valuable source of information about new chemical compounds, which is critical to the drug discovery process. Automated information extraction over chemical patents is, however, a challenging task due to the large volume of existing patents and the complex linguistic properties of chemical patents. The Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020), was introduced to support the development of advanced text mining techniques for chemical patents. The ChEMU 2020 lab proposed two fundamental information extraction tasks focusing on chemical reaction processes described in chemical patents: (1) chemical named entity recognition, requiring identification of essential chemical entities and their roles in chemical reactions, as well as reaction conditions; and (2) event extraction, which aims at identification of event steps relating the entities involved in chemical reactions. The ChEMU 2020 lab received 37 team registrations and 46 runs. Overall, the performance of submissions for these tasks exceeded our expectations, with the top systems outperforming strong baselines. We further show the methods to be robust to variations in sampling of the test data. We provide a detailed overview of the ChEMU 2020 corpus and its annotation, showing that inter-annotator agreement is very strong. We also present the methods adopted by participants, provide a detailed analysis of their performance, and carefully consider the potential impact of data leakage on interpretation of the results. The ChEMU 2020 Lab has shown the viability of automated methods to support information extraction of key information in chemical patents.

3.

Generating and evaluating a propensity model using textual features from electronic medical records.

Afzal, Zubair; Masclee, Gwen M C; Sturkenboom, Miriam C J M; Kors, Jan A; Schuemie, Martijn J.

PLoS One ; 14(3): e0212999, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-30830923

RESUMO

BACKGROUND: Propensity score (PS) methods are commonly used to control for confounding in comparative effectiveness studies. Electronic health records (EHRs) contain much unstructured data that could be used as proxies for potential confounding factors. The goal of this study was to assess whether the unstructured information can also be used to construct PS models that would allow to properly deal with confounding. We used an example of coxibs (Cox-2 inhibitors) vs. traditional NSAIDs and the risk of upper gastro-intestinal bleeding as example, since this association is often confounded due to channeling of coxibs to patients at higher risk of upper gastro-intestinal bleeding. METHODS: In a cohort study of new users of nonsteroidal anti-inflammatory drugs (NSAIDs) from the Dutch Integrated Primary Care Information (IPCI) database, we identified all patients who experienced an upper gastrointestinal bleeding (UGIB). We used a large-scale regularized regression to fit two PS models using all structured and unstructured information in the EHR. We calculated hazard ratios (HRs) to estimate the risk of UGIB among selective cyclo-oxygenase-2 (COX-2) inhibitor users compared to nonselective NSAID (nsNSAID) users. RESULTS: The crude hazard ratio of UGIB for COX-2 inhibitors compared to nsNSAIDs was 0.50 (95% confidence interval 0.18-1.36). Matching only on age resulted in an HR of 0.36 (0.11-1.16), and of 0.35 (0.11-1.11) when further adjusted for sex. Matching on PS only, the first model yielded an HR of 0.42 (0.13-1.38), which reduced to 0.35 (0.96-1.25) when adjusted for age and sex. The second model resulted in an HR of 0.42 (0.13-1.39), which dropped to 0.31 (0.09-1.08) after adjustment for age and sex. CONCLUSIONS: PS models can be created using unstructured information in EHRs. An incremental benefit was observed by matching on PS over traditional matching and adjustment for covariates.

Assuntos

Interpretação Estatística de Dados , Pontuação de Propensão , Adulto , Idoso , Idoso de 80 Anos ou mais , Estudos de Coortes , Fatores de Confusão Epidemiológicos , Inibidores de Ciclo-Oxigenase 2/efeitos adversos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Feminino , Hemorragia Gastrointestinal/induzido quimicamente , Hemorragia Gastrointestinal/epidemiologia , Humanos , Masculino , Pessoa de Meia-Idade , Países Baixos/epidemiologia , Modelos de Riscos Proporcionais , Medição de Risco/métodos

4.

Evaluation of a Pharmacist and Nurse Practitioner Smoking Cessation Program.

Afzal, Zubair; Pogge, Elizabeth; Boomershine, Virginia.

J Pharm Pract ; 30(4): 406-411, 2017 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-27443829

RESUMO

PURPOSE: To evaluate the efficacy of a smoking cessation program led by a pharmacist and a nurse practitioner. METHODS: During a 6-month period, patients attended 7 one-on-one face-to-face smoking cessation counseling sessions with a pharmacist and 1 to 2 one-on-one face-to-face smoking cessation counseling sessions with a nurse practitioner. The primary outcome was smoking cessation point prevalence rates at months 1, 3, and 5 post-quit date. Secondary outcomes included medication adherence rates at months 1, 3, and 5 post-quit date, nicotine dependence at baseline versus program end, and patient satisfaction. RESULTS: Nine (47%) of 19 total participants completed the program. Seven of the 9 patients who completed the program were smoke-free upon study completion. Point prevalence rates at months 1, 3, and 5 post-quit date were 66%, 77%, and 77%, respectively, based on patients who completed the program. Medication adherence rates were 88.6%, 54.6%, and 75% at months 1, 3, and 5 post-quit date, respectively. Based on the Fagerstrom test, nicotine dependence decreased from baseline to the end of the study, 4.89 to 0.33 ( P < .001). Overall, participants rated the program highly. CONCLUSION: A joint pharmacist and nurse practitioner smoking cessation program can assist patients in becoming smoke-free.

Assuntos

Profissionais de Enfermagem/normas , Farmacêuticos/normas , Avaliação de Programas e Projetos de Saúde/métodos , Abandono do Hábito de Fumar/métodos , Fumar/terapia , Adulto , Aconselhamento/métodos , Aconselhamento/normas , Feminino , Seguimentos , Humanos , Masculino , Pessoa de Meia-Idade , Projetos Piloto , Fumar/epidemiologia

5.

Chemical entity recognition in patents by combining dictionary-based and statistical approaches.

Akhondi, Saber A; Pons, Ewoud; Afzal, Zubair; van Haagen, Herman; Becker, Benedikt F H; Hettne, Kristina M; van Mulligen, Erik M; Kors, Jan A.

Database (Oxford) ; 20162016.

Artigo em Inglês | MEDLINE | ID: mdl-27141091

RESUMO

We describe the development of a chemical entity recognition system and its application in the CHEMDNER-patent track of BioCreative 2015. This community challenge includes a Chemical Entity Mention in Patents (CEMP) recognition task and a Chemical Passage Detection (CPD) classification task. We addressed both tasks by an ensemble system that combines a dictionary-based approach with a statistical one. For this purpose the performance of several lexical resources was assessed using Peregrine, our open-source indexing engine. We combined our dictionary-based results on the patent corpus with the results of tmChem, a chemical recognizer using a conditional random field classifier. To improve the performance of tmChem, we utilized three additional features, viz. part-of-speech tags, lemmas and word-vector clusters. When evaluated on the training data, our final system obtained an F-score of 85.21% for the CEMP task, and an accuracy of 91.53% for the CPD task. On the test set, the best system ranked sixth among 21 teams for CEMP with an F-score of 86.82%, and second among nine teams for CPD with an accuracy of 94.23%. The differences in performance between the best ensemble system and the statistical system separately were small.Database URL: http://biosemantics.org/chemdner-patents.

Assuntos

Mineração de Dados/métodos , Bases de Dados de Compostos Químicos , Aprendizado de Máquina , Patentes como Assunto , Modelos Estatísticos , Software

6.

Extraction of chemical-induced diseases using prior knowledge and textual information.

Pons, Ewoud; Becker, Benedikt F H; Akhondi, Saber A; Afzal, Zubair; van Mulligen, Erik M; Kors, Jan A.

Database (Oxford) ; 20162016.

Artigo em Inglês | MEDLINE | ID: mdl-27081155

RESUMO

We describe our approach to the chemical-disease relation (CDR) task in the BioCreative V challenge. The CDR task consists of two subtasks: automatic disease-named entity recognition and normalization (DNER), and extraction of chemical-induced diseases (CIDs) from Medline abstracts. For the DNER subtask, we used our concept recognition tool Peregrine, in combination with several optimization steps. For the CID subtask, our system, which we named RELigator, was trained on a rich feature set, comprising features derived from a graph database containing prior knowledge about chemicals and diseases, and linguistic and statistical features derived from the abstracts in the CDR training corpus. We describe the systems that were developed and present evaluation results for both subtasks on the CDR test set. For DNER, our Peregrine system reached anF-score of 0.757. For CID, the system achieved anF-score of 0.526, which ranked second among 18 participating teams. Several post-challenge modifications of the systems resulted in substantially improvedF-scores (0.828 for DNER and 0.602 for CID). RELigator is available as a web service athttp://biosemantics.org/index.php/software/religator.

Assuntos

Biologia Computacional/métodos , Mineração de Dados/métodos , Bases de Dados Factuais , Doença/etiologia , Substâncias Perigosas/toxicidade , Humanos , Toxicogenética

7.

ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus.

Afzal, Zubair; Pons, Ewoud; Kang, Ning; Sturkenboom, Miriam C J M; Schuemie, Martijn J; Kors, Jan A.

BMC Bioinformatics ; 15: 373, 2014 Nov 29.

Artigo em Inglês | MEDLINE | ID: mdl-25432799

RESUMO

BACKGROUND: In order to extract meaningful information from electronic medical records, such as signs and symptoms, diagnoses, and treatments, it is important to take into account the contextual properties of the identified information: negation, temporality, and experiencer. Most work on automatic identification of these contextual properties has been done on English clinical text. This study presents ContextD, an adaptation of the English ConText algorithm to the Dutch language, and a Dutch clinical corpus. We created a Dutch clinical corpus containing four types of anonymized clinical documents: entries from general practitioners, specialists' letters, radiology reports, and discharge letters. Using a Dutch list of medical terms extracted from the Unified Medical Language System, we identified medical terms in the corpus with exact matching. The identified terms were annotated for negation, temporality, and experiencer properties. To adapt the ConText algorithm, we translated English trigger terms to Dutch and added several general and document specific enhancements, such as negation rules for general practitioners' entries and a regular expression based temporality module. RESULTS: The ContextD algorithm utilized 41 unique triggers to identify the contextual properties in the clinical corpus. For the negation property, the algorithm obtained an F-score from 87% to 93% for the different document types. For the experiencer property, the F-score was 99% to 100%. For the historical and hypothetical values of the temporality property, F-scores ranged from 26% to 54% and from 13% to 44%, respectively. CONCLUSIONS: The ContextD showed good performance in identifying negation and experiencer property values across all Dutch clinical document types. Accurate identification of the temporality property proved to be difficult and requires further work. The anonymized and annotated Dutch clinical corpus can serve as a useful resource for further algorithm development.

Assuntos

Algoritmos , Informática Médica/métodos , Sistemas Computadorizados de Registros Médicos , Processamento de Linguagem Natural , Unified Medical Language System , Bases de Dados Factuais , Humanos , Países Baixos

8.

Knowledge-based extraction of adverse drug events from biomedical text.

Kang, Ning; Singh, Bharat; Bui, Chinh; Afzal, Zubair; van Mulligen, Erik M; Kors, Jan A.

BMC Bioinformatics ; 15: 64, 2014 Mar 04.

Artigo em Inglês | MEDLINE | ID: mdl-24593054

RESUMO

BACKGROUND: Many biomedical relation extraction systems are machine-learning based and have to be trained on large annotated corpora that are expensive and cumbersome to construct. We developed a knowledge-based relation extraction system that requires minimal training data, and applied the system for the extraction of adverse drug events from biomedical text. The system consists of a concept recognition module that identifies drugs and adverse effects in sentences, and a knowledge-base module that establishes whether a relation exists between the recognized concepts. The knowledge base was filled with information from the Unified Medical Language System. The performance of the system was evaluated on the ADE corpus, consisting of 1644 abstracts with manually annotated adverse drug events. Fifty abstracts were used for training, the remaining abstracts were used for testing. RESULTS: The knowledge-based system obtained an F-score of 50.5%, which was 34.4 percentage points better than the co-occurrence baseline. Increasing the training set to 400 abstracts improved the F-score to 54.3%. When the system was compared with a machine-learning system, jSRE, on a subset of the sentences in the ADE corpus, our knowledge-based system achieved an F-score that is 7 percentage points higher than the F-score of jSRE trained on 50 abstracts, and still 2 percentage points higher than jSRE trained on 90% of the corpus. CONCLUSION: A knowledge-based approach can be successfully used to extract adverse drug events from biomedical text without need for a large training set. Whether use of a knowledge base is equally advantageous for other biomedical relation-extraction tasks remains to be investigated.

Assuntos

Inteligência Artificial , Mineração de Dados/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Bases de Conhecimento , Humanos , Unified Medical Language System

9.

Automatic generation of case-detection algorithms to identify children with asthma from large electronic health record databases.

Afzal, Zubair; Engelkes, Marjolein; Verhamme, Katia M C; Janssens, Hettie M; Sturkenboom, Miriam C J M; Kors, Jan A; Schuemie, Martijn J.

Pharmacoepidemiol Drug Saf ; 22(8): 826-33, 2013 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-23592573

RESUMO

PURPOSE: Most electronic health record databases contain unstructured free-text narratives, which cannot be easily analyzed. Case-detection algorithms are usually created manually and often rely only on using coded information such as International Classification of Diseases version 9 codes. We applied a machine-learning approach to generate and evaluate an automated case-detection algorithm that uses both free-text and coded information to identify asthma cases. METHODS: The Integrated Primary Care Information (IPCI) database was searched for potential asthma patients aged 5-18 years using a broad query on asthma-related codes, drugs, and free text. A training set of 5032 patients was created by manually annotating the potential patients as definite, probable, or doubtful asthma cases or non-asthma cases. The rule-learning program RIPPER was then used to generate algorithms to distinguish cases from non-cases. An over-sampling method was used to balance the performance of the automated algorithm to meet our study requirements. Performance of the automated algorithm was evaluated against the manually annotated set. RESULTS: The selected algorithm yielded a positive predictive value (PPV) of 0.66, sensitivity of 0.98, and specificity of 0.95 when identifying only definite asthma cases; a PPV of 0.82, sensitivity of 0.96, and specificity of 0.90 when identifying both definite and probable asthma cases; and a PPV of 0.57, sensitivity of 0.95, and specificity of 0.67 for the scenario identifying definite, probable, and doubtful asthma cases. CONCLUSIONS: The automated algorithm shows good performance in detecting cases of asthma utilizing both free-text and coded data. This algorithm will facilitate large-scale studies of asthma in the IPCI database.

Assuntos

Algoritmos , Asma/epidemiologia , Registros Eletrônicos de Saúde , Adolescente , Criança , Pré-Escolar , Bases de Dados Factuais , Processamento Eletrônico de Dados , Humanos , Valor Preditivo dos Testes , Sensibilidade e Especificidade

10.

Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records.

Afzal, Zubair; Schuemie, Martijn J; van Blijderveen, Jan C; Sen, Elif F; Sturkenboom, Miriam C J M; Kors, Jan A.

BMC Med Inform Decis Mak ; 13: 30, 2013 Mar 02.

Artigo em Inglês | MEDLINE | ID: mdl-23452306

RESUMO

BACKGROUND: Distinguishing cases from non-cases in free-text electronic medical records is an important initial step in observational epidemiological studies, but manual record validation is time-consuming and cumbersome. We compared different approaches to develop an automatic case identification system with high sensitivity to assist manual annotators. METHODS: We used four different machine-learning algorithms to build case identification systems for two data sets, one comprising hepatobiliary disease patients, the other acute renal failure patients. To improve the sensitivity of the systems, we varied the imbalance ratio between positive cases and negative cases using under- and over-sampling techniques, and applied cost-sensitive learning with various misclassification costs. RESULTS: For the hepatobiliary data set, we obtained a high sensitivity of 0.95 (on a par with manual annotators, as compared to 0.91 for a baseline classifier) with specificity 0.56. For the acute renal failure data set, sensitivity increased from 0.69 to 0.89, with specificity 0.59. Performance differences between the various machine-learning algorithms were not large. Classifiers performed best when trained on data sets with imbalance ratio below 10. CONCLUSIONS: We were able to achieve high sensitivity with moderate specificity for automatic case identification on two data sets of electronic medical records. Such a high-sensitive case identification system can be used as a pre-filter to significantly reduce the burden of manual record validation.

Assuntos

Injúria Renal Aguda/epidemiologia , Inteligência Artificial , Doenças Biliares/epidemiologia , Coleta de Dados/métodos , Registros Eletrônicos de Saúde , Hepatopatias/epidemiologia , Algoritmos , Humanos

11.

Using rule-based natural language processing to improve disease normalization in biomedical text.

Kang, Ning; Singh, Bharat; Afzal, Zubair; van Mulligen, Erik M; Kors, Jan A.

J Am Med Inform Assoc ; 20(5): 876-81, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23043124

RESUMO

BACKGROUND AND OBJECTIVE: In order for computers to extract useful information from unstructured text, a concept normalization system is needed to link relevant concepts in a text to sources that contain further information about the concept. Popular concept normalization tools in the biomedical field are dictionary-based. In this study we investigate the usefulness of natural language processing (NLP) as an adjunct to dictionary-based concept normalization. METHODS: We compared the performance of two biomedical concept normalization systems, MetaMap and Peregrine, on the Arizona Disease Corpus, with and without the use of a rule-based NLP module. Performance was assessed for exact and inexact boundary matching of the system annotations with those of the gold standard and for concept identifier matching. RESULTS: Without the NLP module, MetaMap and Peregrine attained F-scores of 61.0% and 63.9%, respectively, for exact boundary matching, and 55.1% and 56.9% for concept identifier matching. With the aid of the NLP module, the F-scores of MetaMap and Peregrine improved to 73.3% and 78.0% for boundary matching, and to 66.2% and 69.8% for concept identifier matching. For inexact boundary matching, performances further increased to 85.5% and 85.4%, and to 73.6% and 73.3% for concept identifier matching. CONCLUSIONS: We have shown the added value of NLP for the recognition and normalization of diseases with MetaMap and Peregrine. The NLP module is general and can be applied in combination with any concept normalization system. Whether its use for concept types other than disease is equally advantageous remains to be investigated.

Assuntos

Doença/classificação , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Terminologia como Assunto , Humanos , Unified Medical Language System , Vocabulário Controlado

12.

Using an ensemble system to improve concept extraction from clinical records.

Kang, Ning; Afzal, Zubair; Singh, Bharat; van Mulligen, Erik M; Kors, Jan A.

J Biomed Inform ; 45(3): 423-8, 2012 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-22239956

RESUMO

Recognition of medical concepts is a basic step in information extraction from clinical records. We wished to improve on the performance of a variety of concept recognition systems by combining their individual results. We selected two dictionary-based systems and five statistical-based systems that were trained to annotate medical problems, tests, and treatments in clinical records. Manually annotated clinical records for training and testing were made available through the 2010 i2b2/VA (Informatics for Integrating Biology and the Bedside) challenge. Results of individual systems were combined by a simple voting scheme. The statistical systems were trained on a set of 349 records. Performance (precision, recall, F-score) was assessed on a test set of 477 records, using varying voting thresholds. The combined annotation system achieved a best F-score of 82.2% (recall 81.2%, precision 83.3%) on the test set, a score that ranks third among 22 participants in the i2b2/VA concept annotation task. The ensemble system had better precision and recall than any of the individual systems, yielding an F-score that is 4.6% point higher than the best single system. Changing the voting threshold offered a simple way to obtain a system with high precision (and moderate recall) or one with high recall (and moderate precision). The ensemble-based approach is straightforward and allows the balancing of precision versus recall of the combined system. The ensemble system is freely available and can easily be extended, integrated in other systems, and retrained.

Assuntos

Mineração de Dados/métodos , Registros Eletrônicos de Saúde , Humanos , Processamento de Linguagem Natural , Semântica

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA