Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
J Clin Transl Sci ; 7(1): e199, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37830010

RESUMO

Background: Randomized clinical trials (RCT) are the foundation for medical advances, but participant recruitment remains a persistent barrier to their success. This retrospective data analysis aims to (1) identify clinical trial features associated with successful participant recruitment measured by accrual percentage and (2) compare the characteristics of the RCTs by assessing the most and least successful recruitment, which are indicated by varying thresholds of accrual percentage such as ≥ 90% vs ≤ 10%, ≥ 80% vs ≤ 20%, and ≥ 70% vs ≤ 30%. Methods: Data from the internal research registry at Columbia University Irving Medical Center and Aggregated Analysis of ClinicalTrials.gov were collected for 393 randomized interventional treatment studies closed to further enrollment. We compared two regularized linear regression and six tree-based machine learning models for accrual percentage (i.e., reported accrual to date divided by the target accrual) prediction. The outperforming model and Tree SHapley Additive exPlanations were used for feature importance analysis for participant recruitment. The identified features were compared between the two subgroups. Results: CatBoost regressor outperformed the others. Key features positively associated with recruitment success, as measured by accrual percentage, include government funding and compensation. Meanwhile, cancer research and non-conventional recruitment methods (e.g., websites) are negatively associated with recruitment success. Statistically significant subgroup differences (corrected p-value < .05) were found in 15 of the top 30 most important features. Conclusion: This multi-source retrospective study highlighted key features influencing RCT participant recruitment, offering actionable steps for improvement, including flexible recruitment infrastructure and appropriate participant compensation.

3.
JAMA Netw Open ; 6(6): e2320455, 2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37368404

RESUMO

This cross-sectional study evaluates the extent of housing unaffordability among US residency programs.


Assuntos
Habitação , Médicos , Humanos , Fatores Socioeconômicos , Custos e Análise de Custo
4.
AMIA Jt Summits Transl Sci Proc ; 2023: 281-290, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37350899

RESUMO

Participant recruitment continues to be a challenge to the success of randomized controlled trials, resulting in increased costs, extended trial timelines and delayed treatment availability. Literature provides evidence that study design features (e.g., trial phase, study site involvement) and trial sponsor are significantly associated with recruitment success. Principal investigators oversee the conduct of clinical trials, including recruitment. Through a cross-sectional survey and a thematic analysis of free-text responses, we assessed the perceptions of sixteen principal investigators regarding success factors for participant recruitment. Study site involvement and funding source do not necessarily make recruitment easier or more challenging from the perspective of the principal investigators. The most commonly used recruitment strategies are also the most effort inefficient (e.g., in-person recruitment, reviewing the electronic medical records for prescreening). Finally, we recommended actionable steps, such as improving staff support and leveraging informatics-driven approaches, to allow clinical researchers to enhance participant recruitment.

5.
Stud Health Technol Inform ; 290: 297-300, 2022 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-35673021

RESUMO

Electronic healthcare records data promises to improve the efficiency of patient eligibility screening, which is an important factor in the success of clinical trials and observational studies. To bridge the sociotechnical gap in cohort identification by end-users, who are clinicians or researchers unfamiliar with underlying EHR databases, we previously developed a natural language query interface named Criteria2Query (C2Q) that automatically transforms free-text eligibility criteria to executable database queries. In this study, we present a comprehensive evaluation of C2Q to generate more actionable insights to inform the design and evaluation of future natural language user interfaces for clinical databases, towards the realization of Augmented Intelligence (AI) for clinical cohort definition via e-screening.


Assuntos
Inteligência Artificial , Processamento de Linguagem Natural , Bases de Dados Factuais , Definição da Elegibilidade , Humanos , Inteligência
6.
Stud Health Technol Inform ; 290: 309-313, 2022 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-35673024

RESUMO

The rapid growth of clinical trials launched in recent years poses significant challenges for accurate and efficient trial search. Keyword-based clinical trial search engines require users to construct effective queries, which can be a difficult task given complex information needs. In this study, we present an interactive clinical trial search interface that retrieves trials similar to a target clinical trial. It enables user configuration of 13 clinical trial features and 4 metrics (Jaccard similarity, semantic-based similarity, temporal overlap and geographical distance) to measure pairwise trial similarities. Among 1,007 coronavirus disease 2019 (COVID-19) trials conducted in the United States, 91.9% were found to have similar trials with the similarity threshold being 0.85 and 43.8% were highly similar with the threshold 0.95. A simulation study using 3 groups of similar trials curated by COVID-19 clinical trial reviews demonstrates the precision and recall of the search interface.


Assuntos
COVID-19 , Benchmarking , Coleta de Dados , Humanos , Ferramenta de Busca , Semântica
7.
Phys Rev Lett ; 128(19): 198003, 2022 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-35622032

RESUMO

Disordered packings of unbonded, semiflexible fibers represent a class of materials spanning contexts and scales. From twig-based bird nests to unwoven textiles, bulk mechanics of disparate systems emerge from the bending of constituent slender elements about impermanent contacts. In experimental and computational packings of wooden sticks, we identify prominent features of their response to cyclic oedometric compression: nonlinear stiffness, transient plasticity, and eventually repeatable velocity-independent hysteresis. We trace these features to their micromechanic origins, identified in characteristic appearance, disappearance, and displacement of internal contacts.

8.
Int J Med Inform ; 156: 104587, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34624661

RESUMO

BACKGROUND: Cardiovascular outcome trials (CVOTs) include patients with high risks for cardiovascular events based on specific inclusion criteria. Little is known about the impact of such inclusion criteria on patient accrual and the incidence rate of cardiovascular events. MATERIALS AND METHODS: We evaluated the impact of criteria on the accrual and the number of cardiovascular events in a cohort of 1544 diabetes patients identified from the clinical data warehouse of New York Presbyterian Hospital / Columbia University Irving Medical Center. RESULTS: The highest incidence rate of the composite events (i.e., cardiovascular mortality, stroke, and myocardial infarction) was observed when the inclusion criteria seek patients with underlying cardiovascular diseases or age ≥ 60 with at least two of the risk factors including duration of diabetes, hypertension, dyslipidemia, smoking status, and albuminuria. CONCLUSION: Our study shows that the electronic health records could be utilized to optimize the inclusion criteria while balancing study inclusiveness and number of events.


Assuntos
Doenças Cardiovasculares , Diabetes Mellitus , Hipertensão , Infarto do Miocárdio , Doenças Cardiovasculares/epidemiologia , Registros Eletrônicos de Saúde , Humanos , Fatores de Risco
9.
Appl Clin Inform ; 12(4): 816-825, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34496418

RESUMO

BACKGROUND: Clinical trials are the gold standard for generating robust medical evidence, but clinical trial results often raise generalizability concerns, which can be attributed to the lack of population representativeness. The electronic health records (EHRs) data are useful for estimating the population representativeness of clinical trial study population. OBJECTIVES: This research aims to estimate the population representativeness of clinical trials systematically using EHR data during the early design stage. METHODS: We present an end-to-end analytical framework for transforming free-text clinical trial eligibility criteria into executable database queries conformant with the Observational Medical Outcomes Partnership Common Data Model and for systematically quantifying the population representativeness for each clinical trial. RESULTS: We calculated the population representativeness of 782 novel coronavirus disease 2019 (COVID-19) trials and 3,827 type 2 diabetes mellitus (T2DM) trials in the United States respectively using this framework. With the use of overly restrictive eligibility criteria, 85.7% of the COVID-19 trials and 30.1% of T2DM trials had poor population representativeness. CONCLUSION: This research demonstrates the potential of using the EHR data to assess the clinical trials population representativeness, providing data-driven metrics to inform the selection and optimization of eligibility criteria.


Assuntos
COVID-19 , Diabetes Mellitus Tipo 2 , Registros Eletrônicos de Saúde , Humanos , Seleção de Pacientes , SARS-CoV-2 , Estados Unidos
10.
AMIA Jt Summits Transl Sci Proc ; 2021: 394-403, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34457154

RESUMO

Human annotations are the established gold standard for evaluating natural language processing (NLP) methods. The goals of this study are to quantify and qualify the disagreement between human and NLP. We developed an NLP system for annotating clinical trial eligibility criteria text and constructed a manually annotated corpus, both following the OMOP Common Data Model (CDM). We analyzed the discrepancies between the human and NLP annotations and their causes (e.g., ambiguities in concept categorization and tacit decisions on inclusion of qualifiers and temporal attributes during concept annotation). This study initially reported complexities in clinical trial eligibility criteria text that complicate NLP and the limitations of the OMOP CDM. The disagreement between and human and NLP annotations may be generalizable. We discuss implications for NLP evaluation.


Assuntos
Processamento de Linguagem Natural , Humanos
11.
JAMIA Open ; 4(2): ooab028, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34142015

RESUMO

OBJECTIVE: Feature engineering is a major bottleneck in phenotyping. Properly learned medical concept embeddings (MCEs) capture the semantics of medical concepts, thus are useful for retrieving relevant medical features in phenotyping tasks. We compared the effectiveness of MCEs learned from knowledge graphs and electronic healthcare records (EHR) data in retrieving relevant medical features for phenotyping tasks. MATERIALS AND METHODS: We implemented 5 embedding methods including node2vec, singular value decomposition (SVD), LINE, skip-gram, and GloVe with 2 data sources: (1) knowledge graphs obtained from the observational medical outcomes partnership (OMOP) common data model; and (2) patient-level data obtained from the OMOP compatible electronic health records (EHR) from Columbia University Irving Medical Center (CUIMC). We used phenotypes with their relevant concepts developed and validated by the electronic medical records and genomics (eMERGE) network to evaluate the performance of learned MCEs in retrieving phenotype-relevant concepts. Hits@k% in retrieving phenotype-relevant concepts based on a single and multiple seed concept(s) was used to evaluate MCEs. RESULTS: Among all MCEs, MCEs learned by using node2vec with knowledge graphs showed the best performance. Of MCEs based on knowledge graphs and EHR data, MCEs learned by using node2vec with knowledge graphs and MCEs learned by using GloVe with EHR data outperforms other MCEs, respectively. CONCLUSION: MCE enables scalable feature engineering tasks, thereby facilitating phenotyping. Based on current phenotyping practices, MCEs learned by using knowledge graphs constructed by hierarchical relationships among medical concepts outperformed MCEs learned by using EHR data.

12.
Stud Health Technol Inform ; 281: 148-152, 2021 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-34042723

RESUMO

2,719 distinctive phenotyping variables from 176 electronic phenotypes were compared with 57,150 distinctive clinical trial eligibility criteria concepts to assess the phenotype knowledge overlap between them. We observed a high percentage (69.5%) of eMERGE phenotype features and a lower percentage (47.6%) of OHDSI phenotype features matched to clinical trial eligibility criteria, possibly due to the relative emphasis on specificity for eMERGE phenotypes and the relative emphasis on sensitivity for OHDSI phenotypes. The study results show the potential of reusing clinical trial eligibility criteria for phenotyping feature selection and moderate benefits of using them for local cohort query implementation.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Eletrônica , Fenótipo
13.
J Biomed Inform ; 118: 103790, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33887457

RESUMO

Clinical trials are essential for generating reliable medical evidence, but often suffer from expensive and delayed patient recruitment because the unstructured eligibility criteria description prevents automatic query generation for eligibility screening. In response to the COVID-19 pandemic, many trials have been created but their information is not computable. We included 700 COVID-19 trials available at the point of study and developed a semi-automatic approach to generate an annotated corpus for COVID-19 clinical trial eligibility criteria called COVIC. A hierarchical annotation schema based on the OMOP Common Data Model was developed to accommodate four levels of annotation granularity: i.e., study cohort, eligibility criteria, named entity and standard concept. In COVIC, 39 trials with more than one study cohorts were identified and labelled with an identifier for each cohort. 1,943 criteria for non-clinical characteristics such as "informed consent", "exclusivity of participation" were annotated. 9767 criteria were represented by 18,161 entities in 8 domains, 7,743 attributes of 7 attribute types and 16,443 relationships of 11 relationship types. 17,171 entities were mapped to standard medical concepts and 1,009 attributes were normalized into computable representations. COVIC can serve as a corpus indexed by semantic tags for COVID-19 trial search and analytics, and a benchmark for machine learning based criteria extraction.


Assuntos
COVID-19 , Ensaios Clínicos como Assunto , Simulação por Computador , Definição da Elegibilidade , Humanos , Aprendizado de Máquina , Pandemias
14.
J Biomed Inform ; 117: 103771, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33813032

RESUMO

OBJECTIVE: We present the Clinical Trial Knowledge Base, a regularly updated knowledge base of discrete clinical trial eligibility criteria equipped with a web-based user interface for querying and aggregate analysis of common eligibility criteria. MATERIALS AND METHODS: We used a natural language processing (NLP) tool named Criteria2Query (Yuan et al., 2019) to transform free text clinical trial eligibility criteria from ClinicalTrials.gov into discrete criteria concepts and attributes encoded using the widely adopted Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) and stored in a relational SQL database. A web application accessible via RESTful APIs was implemented to enable queries and visual aggregate analyses. We demonstrate CTKB's potential role in EHR phenotype knowledge engineering using ten validated phenotyping algorithms. RESULTS: At the time of writing, CTKB contained 87,504 distinctive OMOP CDM standard concepts, including Condition (47.82%), Drug (23.01%), Procedure (13.73%), Measurement (24.70%) and Observation (5.28%), with 34.78% for inclusion criteria and 65.22% for exclusion criteria, extracted from 352,110 clinical trials. The average hit rate of criteria concepts in eMERGE phenotype algorithms is 77.56%. CONCLUSION: CTKB is a novel comprehensive knowledge base of discrete eligibility criteria concepts with the potential to enable knowledge engineering for clinical trial cohort definition, clinical trial population representativeness assessment, electronical phenotyping, and data gap analyses for using electronic health records to support clinical trial recruitment.


Assuntos
Bases de Conhecimento , Processamento de Linguagem Natural , Algoritmos , Ensaios Clínicos como Assunto , Bases de Dados Factuais , Registros Eletrônicos de Saúde , Humanos
15.
J Am Med Inform Assoc ; 28(1): 14-22, 2021 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-33260201

RESUMO

OBJECTIVE: This research aims to evaluate the impact of eligibility criteria on recruitment and observable clinical outcomes of COVID-19 clinical trials using electronic health record (EHR) data. MATERIALS AND METHODS: On June 18, 2020, we identified frequently used eligibility criteria from all the interventional COVID-19 trials in ClinicalTrials.gov (n = 288), including age, pregnancy, oxygen saturation, alanine/aspartate aminotransferase, platelets, and estimated glomerular filtration rate. We applied the frequently used criteria to the EHR data of COVID-19 patients in Columbia University Irving Medical Center (CUIMC) (March 2020-June 2020) and evaluated their impact on patient accrual and the occurrence of a composite endpoint of mechanical ventilation, tracheostomy, and in-hospital death. RESULTS: There were 3251 patients diagnosed with COVID-19 from the CUIMC EHR included in the analysis. The median follow-up period was 10 days (interquartile range 4-28 days). The composite events occurred in 18.1% (n = 587) of the COVID-19 cohort during the follow-up. In a hypothetical trial with common eligibility criteria, 33.6% (690/2051) were eligible among patients with evaluable data and 22.2% (153/690) had the composite event. DISCUSSION: By adjusting the thresholds of common eligibility criteria based on the characteristics of COVID-19 patients, we could observe more composite events from fewer patients. CONCLUSIONS: This research demonstrated the potential of using the EHR data of COVID-19 patients to inform the selection of eligibility criteria and their thresholds, supporting data-driven optimization of participant selection towards improved statistical power of COVID-19 trials.


Assuntos
COVID-19/terapia , Ensaios Clínicos como Assunto , Registros Eletrônicos de Saúde , Definição da Elegibilidade , Adolescente , Adulto , Idoso de 80 Anos ou mais , COVID-19/mortalidade , Feminino , Mortalidade Hospitalar , Humanos , Masculino , Pessoa de Meia-Idade , Oxigênio/sangue , Seleção de Pacientes , Gravidez , Projetos de Pesquisa , Respiração Artificial , SARS-CoV-2 , Traqueostomia , Resultado do Tratamento , Adulto Jovem
16.
J Am Med Inform Assoc ; 28(3): 616-621, 2021 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-33216120

RESUMO

Clinical trials are the gold standard for generating reliable medical evidence. The biggest bottleneck in clinical trials is recruitment. To facilitate recruitment, tools for patient search of relevant clinical trials have been developed, but users often suffer from information overload. With nearly 700 coronavirus disease 2019 (COVID-19) trials conducted in the United States as of August 2020, it is imperative to enable rapid recruitment to these studies. The COVID-19 Trial Finder was designed to facilitate patient-centered search of COVID-19 trials, first by location and radius distance from trial sites, and then by brief, dynamically generated medical questions to allow users to prescreen their eligibility for nearby COVID-19 trials with minimum human computer interaction. A simulation study using 20 publicly available patient case reports demonstrates its precision and effectiveness.


Assuntos
COVID-19 , Ensaios Clínicos como Assunto , Indexação e Redação de Resumos , Adulto , Idoso , Idoso de 80 Anos ou mais , Pré-Escolar , Definição da Elegibilidade , Feminino , Humanos , Armazenamento e Recuperação da Informação , Masculino , Pessoa de Meia-Idade , Seleção de Pacientes
17.
Ann Thorac Surg ; 112(6): 2039-2045, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-33159864

RESUMO

BACKGROUND: The Physician Payments Sunshine Act was enacted to understand financial relationships with industry that might influence provider decisions. We investigated how industry payments within the congenital heart community relate to experience and reputation. METHODS: Congenital cardiothoracic surgeons and pediatric cardiologists were identified from the Open Payments Database. All payments from 2013 through 2017 were matched to affiliated hospitals' U.S. News & World Report (USNWR) rankings, The Society of Thoracic Surgeons-Congenital Heart Surgery Public Reporting Star Ratings, and Optum Center of Excellence (COE) designation. Surgeon payments were linked to years since terminal training. Univariable analyses were conducted. RESULTS: The median payment amount per surgeon ($71; interquartile range [IQR], $41-$99) was nearly double the median payment amount per cardiologist ($41; IQR, $18-$84; P < .05). For surgeons, median individual payment was 56% higher to payees at USNWR top 10 children's hospitals ($100; IQR, $28-$203) vs all others ($64; IQR, $23-$140; P < .001). For cardiologists, median individual payment was 26% higher to payees at USNWR top 10 children's hospitals ($73; IQR, $28-$197) vs all others ($58; IQR, $19-$140; P < .001). Findings were similar across The Society of Thoracic Surgeons-Congenital Heart Surgery star rankings and Optum Center of Excellence groups. By surgeon experience, surgeons 0 to 6 years posttraining (first quartile) received the highest number of median payments per surgeon (17 payments; IQR, 6.5-28 payments; P < .001). Surgeons 21 to 44 years posttraining (fourth quartile) received the lowest median individual payment ($51; IQR, $20-132; P < .001). CONCLUSIONS: Industry payments vary by hospital reputation and provider experience. Such biases must be understood for self-governance and the delineation of conflict of interest policies that balance industry relationships with clinical innovation.


Assuntos
Setor de Assistência à Saúde/economia , Cardiopatias Congênitas/cirurgia , Indústrias/economia , Salários e Benefícios/economia , Cirurgiões/economia , Conflito de Interesses/economia , Bases de Dados Factuais , Cardiopatias Congênitas/economia , Humanos , Estudos Retrospectivos , Estados Unidos
18.
Sci Data ; 7(1): 281, 2020 08 27.
Artigo em Inglês | MEDLINE | ID: mdl-32855408

RESUMO

We present Chia, a novel, large annotated corpus of patient eligibility criteria extracted from 1,000 interventional, Phase IV clinical trials registered in ClinicalTrials.gov. This dataset includes 12,409 annotated eligibility criteria, represented by 41,487 distinctive entities of 15 entity types and 25,017 relationships of 12 relationship types. Each criterion is represented as a directed acyclic graph, which can be easily transformed into Boolean logic to form a database query. Chia can serve as a shared benchmark to develop and test future machine learning, rule-based, or hybrid methods for information extraction from free-text clinical trial eligibility criteria.


Assuntos
Ensaios Clínicos Fase IV como Assunto , Humanos
19.
AMIA Annu Symp Proc ; 2020: 283-292, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33936400

RESUMO

Rapidly increasing costs have been a major threat to our clinical research enterprise. Improvement in appointment scheduling is a crucial means to boost efficiency and save cost in clinical research and has been well studied in the outpatient setting. This study reviews nearly 5 years of usage data of an integrated scheduling system implemented at Columbia University/New York Presbyterian (CUIMC/NYP) called IMPACT and provides original insights into the challenges faced by a clinical research facility. Briefly, the IMPACT data shows that high rates of room and resource changes correlate with rescheduled appointments and that rescheduled visits are more likely to be attended than non-rescheduled visits. We highlight the differing roles of schedulers, coordinators, and investigators, and propose a highly accurate predictive model of participant no-shows in a research setting. This study sheds light on ways to reduce overall cost and improve the care we offer to clinical research participants.


Assuntos
Agendamento de Consultas , Atenção à Saúde/organização & administração , Hospitais , Algoritmos , Instituições de Assistência Ambulatorial , Ensaios Clínicos como Assunto , Eficiência Organizacional , Custos de Cuidados de Saúde , Humanos , Masculino , New York , Pacientes Ambulatoriais
20.
J Biomed Inform ; 100: 103318, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31655273

RESUMO

BACKGROUND: Manually curating standardized phenotypic concepts such as Human Phenotype Ontology (HPO) terms from narrative text in electronic health records (EHRs) is time consuming and error prone. Natural language processing (NLP) techniques can facilitate automated phenotype extraction and thus improve the efficiency of curating clinical phenotypes from clinical texts. While individual NLP systems can perform well for a single cohort, an ensemble-based method might shed light on increasing the portability of NLP pipelines across different cohorts. METHODS: We compared four NLP systems, MetaMapLite, MedLEE, ClinPhen and cTAKES, and four ensemble techniques, including intersection, union, majority-voting and machine learning, for extracting generic phenotypic concepts. We addressed two important research questions regarding automated phenotype recognition. First, we evaluated the performance of different approaches in identifying generic phenotypic concepts. Second, we compared the performance of different methods to identify patient-specific phenotypic concepts. To better quantify the effects caused by concept granularity differences on performance, we developed a novel evaluation metric that considered concept hierarchies and frequencies. Each of the approaches was evaluated on a gold standard set of clinical documents annotated by clinical experts. One dataset containing 1,609 concepts derived from 50 clinical notes from two different institutions was used in both evaluations, and an additional dataset of 608 concepts derived from 50 case report abstracts obtained from PubMed was used for evaluation of identifying generic phenotypic concepts only. RESULTS: For generic phenotypic concept recognition, the top three performers in the NYP/CUIMC dataset are union ensemble (F1, 0.634), training-based ensemble (F1, 0.632), and majority vote-based ensemble (F1, 0.622). In the Mayo dataset, the top three are majority vote-based ensemble (F1, 0.642), cTAKES (F1, 0.615), and MedLEE (F1, 0.559). In the PubMed dataset, the top three are majority vote-based ensemble (F1, 0.719), training-based (F1, 0.696) and MetaMapLite (F1, 0.694). For identifying patient specific phenotypes, the top three performers in the NYP/CUIMC dataset are majority vote-based ensemble (F1, 0.610), MedLEE (F1, 0.609), and training-based ensemble (F1, 0.585). In the Mayo dataset, the top three are majority vote-based ensemble (F1, 0.604), cTAKES (F1, 0.531) and MedLEE (F1, 0.527). CONCLUSIONS: Our study demonstrates that ensembles of natural language processing can improve both generic phenotypic concept recognition and patient specific phenotypic concept identification over individual systems. Among the individual NLP systems, each individual system performed best when they were applied in the dataset that they were primary designed for. However, combining multiple NLP systems to create an ensemble can generally improve the performance. Specifically, the ensemble can increase the results reproducibility across different cohorts and tasks, and thus provide a more portable phenotyping solution compared to individual NLP systems.


Assuntos
Processamento de Linguagem Natural , Fenótipo , Conjuntos de Dados como Assunto , Registros Eletrônicos de Saúde , Humanos , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA