Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
PLoS One ; 19(3): e0300817, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38536822

RESUMO

INTRODUCTION: Bronchopulmonary dysplasia (BPD) poses a substantial global health burden. Individualized treatment strategies based on early prediction of the development of BPD can mitigate preterm birth complications; however, previously suggested predictive models lack early postnatal applicability. We aimed to develop predictive models for BPD and mortality based on immediate postnatal clinical data. METHODS: Clinical information on very preterm and very low birth weight infants born between 2008 and 2018 was extracted from a nationwide Japanese database. The gradient boosting decision trees (GBDT) algorithm was adopted to predict BPD and mortality, using predictors within the first 6 h postpartum. We assessed the temporal validity and evaluated model adequacy using Shapley additive explanations (SHAP) values. RESULTS: We developed three predictive models using data from 39,488, 39,096, and 40,291 infants to predict "death or BPD," "death or severe BPD," and "death before discharge," respectively. These well-calibrated models achieved areas under the receiver operating characteristic curve of 0.828 (95% CI: 0.828-0.828), 0.873 (0.873-0.873), and 0.887 (0.887-0.888), respectively, outperforming the multivariable logistic regression models. SHAP value analysis identified predictors of BPD, including gestational age, size at birth, male sex, and persistent pulmonary hypertension. In SHAP value-based case clustering, the "death or BPD" prediction model stratified infants by gestational age and persistent pulmonary hypertension, whereas the other models for "death or severe BPD" and "death before discharge" commonly formed clusters of low mortality, extreme prematurity, low Apgar scores, and persistent pulmonary hypertension of the newborn. CONCLUSIONS: GBDT models for predicting BPD and mortality, designed for use within 6 h postpartum, demonstrated superior prognostic performance. SHAP value-based clustering, a data-driven approach, formed clusters of clinical relevance. These findings suggest the efficacy of a GBDT algorithm for the early postnatal prediction of BPD.


Assuntos
Displasia Broncopulmonar , Hipertensão Pulmonar , Nascimento Prematuro , Lactente , Feminino , Humanos , Recém-Nascido , Gravidez , Displasia Broncopulmonar/diagnóstico , Displasia Broncopulmonar/epidemiologia , Displasia Broncopulmonar/complicações , Japão/epidemiologia , Lactente Extremamente Prematuro , Hipertensão Pulmonar/complicações , Recém-Nascido de muito Baixo Peso , Idade Gestacional , Árvores de Decisões
2.
Stud Health Technol Inform ; 310: 559-563, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38269871

RESUMO

Important pieces of information related to patient symptoms and diagnosis are often written in free-text form in clinical texts. To utilize these texts, information extraction using natural language processing is required. This study evaluated the performance of named entity recognition (NER) and relation extraction (RE) using machine-learning methods. The Japanese case report corpus was used for this study, which had 113 types of entities and 36 types of relations that were manually annotated. There were 183 cases comprising 2,194 sentences after preprocessing. In addition, a machine learning model based on bidirectional encoder representations from transformers was used. The results revealed that the maximum micro-averaged F1 scores of NER and RE were 0.912 and 0.759, respectively. The results of this study are comparable to those of previous studies. Hence, these results could be of substantial baseline accuracy.


Assuntos
Fontes de Energia Elétrica , Redação , Humanos , Japão , Armazenamento e Recuperação da Informação , Aprendizado de Máquina
3.
Stud Health Technol Inform ; 310: 715-719, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38269902

RESUMO

Transformation of patient data extracted from a database into fixed-length numerical vectors requires expertise in topical medical knowledge as well as data manipulation-thus, manual feature design is labor-intensive. In this study, we propose a machine learning-based method to for this purpose applicable to electronic medical data recorded during hospitalization, which utilizes unsupervised feature extraction based on graph embedding. Unsupervised learning is performed on a heterogeneous graph using Graph2Vec, and the inclusion of clinically useful data in the obtained embedding representation is evaluated by predicting readmission within 30 days of discharge based on it. The embedded representations are observed to improve predictive performance significantly as the information contained in the graph increases, indicating the suitability of the proposed method for feature design corresponding to clinical information.


Assuntos
Prontuários Médicos , Registros , Humanos , Bases de Dados Factuais , Hospitalização , Conhecimento
4.
JMIR Form Res ; 7: e45867, 2023 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-37669092

RESUMO

BACKGROUND: As of December 2022, the outbreak of COVID-19 showed no sign of abating, continuing to impact people's lives, livelihoods, economies, and more. Vaccination is an effective way to achieve mass immunity. However, in places such as Japan, where vaccination is voluntary, there are people who choose not to receive the vaccine, even if an effective vaccine is offered. To promote vaccination, it is necessary to clarify what kind of information on social media can influence attitudes toward vaccines. OBJECTIVE: False rumors and counterrumors are often posted and spread in large numbers on social media, especially during emergencies. In this paper, we regard tweets that contain questions or point out errors in information as counterrumors. We analyze counterrumors tweets related to the COVID-19 vaccine on Twitter. We aimed to answer the following questions: (1) what kinds of COVID-19 vaccine-related counterrumors were posted on Twitter, and (2) are the posted counterrumors related to social conditions such as vaccination status? METHODS: We use the following data sets: (1) counterrumors automatically collected by the "rumor cloud" (18,593 tweets); and (2) the number of COVID-19 vaccine inoculators from September 27, 2021, to August 15, 2022, published on the Prime Minister's Office's website. First, we classified the contents contained in counterrumors. Second, we counted the number of COVID-19 vaccine-related counterrumors from data set 1. Then, we examined the cross-correlation coefficients between the numbers of data sets 1 and 2. Through this verification, we examined the correlation coefficients for the following three periods: (1) the same period of data; (2) the case where the occurrence of the suggestion of counterrumors precedes the vaccination (negative time lag); and (3) the case where the vaccination precedes the occurrence of counterrumors (positive time lag). The data period used for the validation was from October 4, 2021, to April 18, 2022. RESULTS: Our classification results showed that most counterrumors about the COVID-19 vaccine were negative. Moreover, the correlation coefficients between the number of counterrumors and vaccine inoculators showed significant and strong positive correlations. The correlation coefficient was over 0.7 at -8, -7, and -1 weeks of lag. Results suggest that the number of vaccine inoculators tended to increase with an increase in the number of counterrumors. Significant correlation coefficients of 0.5 to 0.6 were observed for lags of 1 week or more and 2 weeks or more. This implies that an increase in vaccine inoculators increases the number of counterrumors. These results suggest that the increase in the number of counterrumors may have been a factor in inducing vaccination behavior. CONCLUSIONS: Using quantitative data, we were able to reveal how counterrumors influence the vaccination status of the COVID-19 vaccine. We think that our findings would be a foundation for considering countermeasures of vaccination.

5.
Cancer Sci ; 114(4): 1710-1717, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36601953

RESUMO

Comprehensive cancer genome profiling (CGP) has been nationally reimbursed in Japan since June 2019. Less than 10% of the patients have been reported to undergo recommended treatment. Todai OncoPanel (TOP) is a dual DNA-RNA panel as well as a paired tumor-normal matched test. Two hundred patients underwent TOP as part of Advanced Medical Care B with approval from the Ministry of Health, Labour and Welfare between September 2018 and December 2019. Tests were carried out in patients with cancers without standard treatment or when patients had already undergone standard treatment. Data from DNA and RNA panels were analyzed in 198 and 191 patients, respectively. The percentage of patients who were given therapeutic or diagnostic recommendations was 61% (120/198). One hundred and four samples (53%) harbored gene alterations that were detected with the DNA panel and had potential treatment implications, and 14 samples (7%) had a high tumor mutational burden. Twenty-two samples (11.1%) harbored 30 fusion transcripts or MET exon 14 skipping that were detected by the RNA panel. Of those 30 transcripts, 6 had treatment implications and 4 had diagnostic implications. Thirteen patients (7%) were found to have pathogenic or likely pathogenic germline variants and genetic counseling was recommended. Overall, 12 patients (6%) received recommended treatment. In summary, patients benefited from both TOP DNA and RNA panels while following the same indication as the approved CGP tests. (UMIN000033647).


Assuntos
Genômica , Neoplasias , Humanos , Japão , Neoplasias/tratamento farmacológico , Neoplasias/genética , Medicina de Precisão
6.
AMIA Annu Symp Proc ; 2023: 618-623, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38222342

RESUMO

The diversity of patient information recorded on electronic medical records generally, presents a challenge for converting it into fixed-length vectors that align with clinical characteristics. To address this issue, this study aimed to utilize an unsupervised graph representation learning method to transform the unstructured inpatient information from electronic medical records into a fixed-length vector. Infograph, one of the unsupervised graph representation learning algorithms was applied to the graphed inpatient information, resulting in embedded vectors of fixed length. The embedded vectors were then evaluated for whether the clinical information was preserved in it. The results indicated that the embedded representation contained information that could predict readmission within 30 days, demonstrating the feasibility of using unsupervised graph representation learning to transform patient information into fixed-length vectors that retain clinical characteristics.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Humanos
7.
Diagnostics (Basel) ; 12(12)2022 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-36552963

RESUMO

The histopathological findings of the glomeruli from whole slide images (WSIs) of a renal biopsy play an important role in diagnosing and grading kidney disease. This study aimed to develop an automated computational pipeline to detect glomeruli and to segment the histopathological regions inside of the glomerulus in a WSI. In order to assess the significance of this pipeline, we conducted a multivariate regression analysis to determine whether the quantified regions were associated with the prognosis of kidney function in 46 cases of immunoglobulin A nephropathy (IgAN). The developed pipelines showed a mean intersection over union (IoU) of 0.670 and 0.693 for five classes (i.e., background, Bowman's space, glomerular tuft, crescentic, and sclerotic regions) against the WSI of its facility, and 0.678 and 0.609 against the WSI of the external facility. The multivariate analysis revealed that the predicted sclerotic regions, even those that were predicted by the external model, had a significant negative impact on the slope of the estimated glomerular filtration rate after biopsy. This is the first study to demonstrate that the quantified sclerotic regions that are predicted by an automated computational pipeline for the segmentation of the histopathological glomerular components on WSIs impact the prognosis of kidney function in patients with IgAN.

8.
J Biomed Inform ; 134: 104200, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36089198

RESUMO

In clinical records, much of the clinical information is recorded as free text, thus necessitating the use of advanced automatic information extraction technology. The development of practical technologies requires a corpus with finer granularity annotations that describe the information in the corpus, but such annotation criteria have not been researched enough thus far. This study aimed to develop fine grained annotation criteria that exhaustively cover patients' states in case reports. We collected 362 case reports-written in Japanese-of intractable diseases that were expected to contain a broad range of patients' states. Criteria were developed by repeatedly revising and annotating the clinical case report text. A set of annotation criteria for patients' states, consisting of 46 entity types, 9 attributes, and 36 relations, was obtained it allows more detailed information to be expressed than in previous studies by broader range of concept types including treatment, and captures clinical information based on a combination of causality and judgment, which could not be expressed before.


Assuntos
Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Humanos
9.
JMIR Med Inform ; 10(7): e37913, 2022 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-35896017

RESUMO

BACKGROUND: Falls may cause elderly people to be bedridden, requiring professional intervention; thus, fall prevention is crucial. The use of electronic health records (EHRs) is expected to provide highly accurate risk assessment and length-of-stay data related to falls, which may be used to estimate the costs and benefits of prevention. However, no studies to date have investigated the extent to which hospital stays could be shortened through fall avoidance resulting from the use of prediction tools. OBJECTIVE: We first estimated the extended length of hospital stay caused by falls among elderly inpatients. Next, we developed a model that predicts falls using clinical text as input and evaluated its accuracy. Finally, we estimated the potentially shortened hospital stay that would be made possible by appropriate interventions based on the prediction model. METHODS: Patients aged 65 years or older were selected as subjects, and the EHRs of 1728 falls and 70,586 nonfalls were subjected to analysis. The extended-stay lengths were estimated using propensity score matching of 49 associated variables. Bidirectional encoder representations from transformers and bidirectional long short-term memory methods were used to predict falls from clinical text. The estimated length of stay and the outputs of the prediction model were used to determine stay reductions. RESULTS: The extended length of hospital stay due to falls was estimated to be 17.8 days (95% CI 16.6-19.0), which dropped to 8.6 days when there were unobserved covariates at an odds ratio of 2.0. The accuracy of the prediction model was as follows: area under the receiver operating characteristic curve, 0.851; F-value, 0.165; recall, 0.737; precision, 0.093; and specificity, 0.839. When assuming interventions with 25% or 100% effectiveness against cases where the model predicted a fall, the stay reduction was estimated at 0.022 and 0.099 days/day, respectively. CONCLUSIONS: The accuracy of the prediction model using clinical text is considered to be higher than the prediction accuracy of conventional assessments. However, our model's precision remained low at 9.3%. This may be due, in part, to the inclusion of cases in which falls did not occur because of preventative interventions during hospitalization. Nonetheless, it is estimated that interventions for cases when falls were predicted will reduce medical costs by 886 Yen/day (~US $6.50/day) of intervention, even if the preventative effect is 25%. Limitations include the fact that these results cannot be extrapolated to short- or long-term hospitalization cases, and that this was a single-center study.

10.
PLoS One ; 17(6): e0269570, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35749395

RESUMO

Deep learning techniques have recently been applied to analyze associations between gene expression data and disease phenotypes. However, there are concerns regarding the black box problem: it is difficult to interpret why the prediction results are obtained using deep learning models from model parameters. New methods have been proposed for interpreting deep learning model predictions but have not been applied to genetics. In this study, we demonstrated that applying SHapley Additive exPlanations (SHAP) to a deep learning model using graph convolutions of genetic pathways can provide pathway-level feature importance for classification prediction of diffuse large B-cell lymphoma (DLBCL) gene expression subtypes. Using Kyoto Encyclopedia of Genes and Genomes pathways, a graph convolutional network (GCN) model was implemented to construct graphs with nodes and edges. DLBCL datasets, including microarray gene expression data and clinical information on subtypes (germinal center B-cell-like type and activated B-cell-like type), were retrieved from the Gene Expression Omnibus to evaluate the model. The GCN model showed an accuracy of 0.914, precision of 0.948, recall of 0.868, and F1 score of 0.906 in analysis of the classification performance for the test datasets. The pathways with high feature importance by SHAP included highly enriched pathways in the gene set enrichment analysis. Moreover, a logistic regression model with explanatory variables of genes in pathways with high feature importance showed good performance in predicting DLBCL subtypes. In conclusion, our GCN model for classifying DLBCL subtypes is useful for interpreting important regulatory pathways that contribute to the prediction.


Assuntos
Linfoma Difuso de Grandes Células B , Expressão Gênica , Centro Germinativo/patologia , Humanos , Linfoma Difuso de Grandes Células B/genética , Linfoma Difuso de Grandes Células B/patologia , Análise em Microsséries , Fenótipo
11.
PLoS One ; 16(11): e0259763, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34752490

RESUMO

Generalized language models that are pre-trained with a large corpus have achieved great performance on natural language tasks. While many pre-trained transformers for English are published, few models are available for Japanese text, especially in clinical medicine. In this work, we demonstrate the development of a clinical specific BERT model with a huge amount of Japanese clinical text and evaluate it on the NTCIR-13 MedWeb that has fake Twitter messages regarding medical concerns with eight labels. Approximately 120 million clinical texts stored at the University of Tokyo Hospital were used as our dataset. The BERT-base was pre-trained using the entire dataset and a vocabulary including 25,000 tokens. The pre-training was almost saturated at about 4 epochs, and the accuracies of Masked-LM and Next Sentence Prediction were 0.773 and 0.975, respectively. The developed BERT did not show significantly higher performance on the MedWeb task than the other BERT models that were pre-trained with Japanese Wikipedia text. The advantage of pre-training on clinical text may become apparent in more complex tasks on actual clinical text, and such an evaluation set needs to be developed.


Assuntos
Idioma , Medicina Clínica , Fontes de Energia Elétrica , Japão , Envio de Mensagens de Texto
12.
Kidney Int Rep ; 6(3): 716-726, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33732986

RESUMO

INTRODUCTION: Diagnosing renal pathologies is important for performing treatments. However, classifying every glomerulus is difficult for clinicians; thus, a support system, such as a computer, is required. This paper describes the automatic classification of glomerular images using a convolutional neural network (CNN). METHOD: To generate appropriate labeled data, annotation criteria including 12 features (e.g., "fibrous crescent") were defined. The concordance among 5 clinicians was evaluated for 100 images using the kappa (κ) coefficient for each feature. Using the annotation criteria, 1 clinician annotated 10,102 images. We trained the CNNs to classify the features with an average κ ≥0.4 and evaluated their performance using the receiver operating characteristic-area under the curve (ROC-AUC). An error analysis was conducted and the gradient-weighted class activation mapping (Grad-CAM) was also applied; it expresses the CNN's focusing point with a heat map when the CNN classifies the glomerular image for a feature. RESULTS: The average κ coefficient of the features ranged from 0.28 to 0.50. The ROC-AUC of the CNNs for test data varied from 0.65 to 0.98. Among the features, "capillary collapse" and "fibrous crescent" had high ROC-AUC values of 0.98 and 0.91, respectively. The error analysis and the Grad-CAM visually showed that the CNN could not distinguish between 2 different features that had similar visual structures or that occurred simultaneously. CONCLUSION: The differences in the texture or frequency of the co-occurrence between the different features affected the CNN performance; thus, to improve the classification accuracy, methods such as segmentation are required.

13.
PLoS One ; 16(2): e0246640, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33544775

RESUMO

Risk assessment of in-hospital mortality of patients at the time of hospitalization is necessary for determining the scale of required medical resources for the patient depending on the patient's severity. Because recent machine learning application in the clinical area has been shown to enhance prediction ability, applying this technique to this issue can lead to an accurate prediction model for in-hospital mortality prediction. In this study, we aimed to generate an accurate prediction model of in-hospital mortality using machine learning techniques. Patients 18 years of age or older admitted to the University of Tokyo Hospital between January 1, 2009 and December 26, 2017 were used in this study. The data were divided into a training/validation data set (n = 119,160) and a test data set (n = 33,970) according to the time of admission. The prediction target of the model was the in-hospital mortality within 14 days. To generate the prediction model, 25 variables (age, sex, 21 laboratory test items, length of stay, and mortality) were used to predict in-hospital mortality. Logistic regression, random forests, multilayer perceptron, and gradient boost decision trees were performed to generate the prediction models. To evaluate the prediction capability of the model, the model was tested using a test data set. Mean probabilities obtained from trained models with five-fold cross-validation were used to calculate the area under the receiver operating characteristic (AUROC) curve. In a test stage using the test data set, prediction models of in-hospital mortality within 14 days showed AUROC values of 0.936, 0.942, 0.942, and 0.938 for logistic regression, random forests, multilayer perceptron, and gradient boosting decision trees, respectively. Machine learning-based prediction of short-term in-hospital mortality using admission laboratory data showed outstanding prediction capability and, therefore, has the potential to be useful for the risk assessment of patients at the time of hospitalization.


Assuntos
Testes Diagnósticos de Rotina/métodos , Registros Eletrônicos de Saúde/normas , Mortalidade Hospitalar/tendências , Hospitalização/estatística & dados numéricos , Aprendizado de Máquina , Medição de Risco/métodos , Bases de Dados Factuais/estatística & dados numéricos , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Curva ROC , Estudos Retrospectivos
14.
Drug Saf ; 42(9): 1055-1069, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31119651

RESUMO

INTRODUCTION: Patients often take several different medications for multiple conditions concurrently. Therefore, when adverse drug events (ADEs) occur, it is necessary to consider the mechanisms responsible. Few approaches consider the mechanisms of ADEs, such as changes in physiological states. We proposed that the ontological framework for pharmacology and mechanism of action (pharmacodynamics) we developed could be used for this approach. However, the existing knowledge base contains little data on physiological chains (PCs). OBJECTIVE: We aimed to investigate a method for automatically generating missing PC from the viewpoint of anatomical structures. This study was conducted to determine dysuria-related adverse events more likely to occur during multidrug administration. METHODS: We adopted a systematic approach to determine drugs suspected to cause adverse events and incorporated existing data and data generated in our newly developed method into our ontological framework. The performance of automated data generation was evaluated using this newly developed system. Suspected drugs determined by the system were compared with those derived from adverse events databases. RESULTS: Of the 242 drugs involving suspected drug-induced urinary retention or dysuria, 26 suspected drugs were determined. Of these, five were drugs with side effects not listed in drug package inserts. The system derived potential mechanisms of action, PCs, and suspected drugs. CONCLUSION: Our method is novel in that it generates PC data from anatomical structural properties and could serve as a knowledge base for determining suspected drugs by potential mechanisms of action.


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos/estatística & dados numéricos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Disuria/induzido quimicamente , Retenção Urinária/induzido quimicamente , Bases de Dados Factuais , Humanos , Preparações Farmacêuticas/administração & dosagem
15.
Int J Med Inform ; 124: 90-96, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30784432

RESUMO

OBJECTIVES: Electronic health record (EHR)-based phenotyping is an automated technique for identifying patients diagnosed with a particular disease using EHR data. However, EHR-based phenotyping has difficulties in achieving satisfactorily high performance because clinical notes include disease mentions that ultimately signify something other than the patient's diagnosis (such as differential diagnosis or screening). Our objective is to quantify the influence of such disease mentions on EHR-based phenotyping performance. METHODS: Physicians manually reviewed whether the disease mentions indicated the patients' diseases in 487,300 clinical notes of 4,430 patients. Particular focus was placed on disease mentions that did not signify the patient's diagnosis even though they did not have any syntactic modifier or indicator in the same sentences. Patients were then classified according to whether their clinical notes included such disease mentions. RESULTS: Among the patients whose clinical notes included disease mentions without any modifier or indicator, the proportion of patients whose disease mentions signified the patients' diagnosis was 78.1% (on average). This value can be interpreted as the bias of disease mentions that did not signify the patient's diagnosis on the precision of EHR-based phenotyping by extracting disease mentions from clinical notes. CONCLUSION: This study quantified the bias occurred owing to disease mentions that incorrectly signify a patient's diagnosis in the value of precision of EHR-based phenotyping from four dataset types. The results of this study will help researchers in diverse research environments with different available data types.


Assuntos
Diagnóstico , Registros Eletrônicos de Saúde , Diagnóstico Diferencial , Difusão de Inovações , Humanos , Padrões de Prática Médica , Reprodutibilidade dos Testes
16.
J Diabetes Sci Technol ; 11(4): 791-799, 2017 07.
Artigo em Inglês | MEDLINE | ID: mdl-27932531

RESUMO

BACKGROUND: Phenotyping is an automated technique that can be used to distinguish patients based on electronic health records. To improve the quality of medical care and advance type 2 diabetes mellitus (T2DM) research, the demand for T2DM phenotyping has been increasing. Some existing phenotyping algorithms are not sufficiently accurate for screening or identifying clinical research subjects. OBJECTIVE: We propose a practical phenotyping framework using both expert knowledge and a machine learning approach to develop 2 phenotyping algorithms: one is for screening; the other is for identifying research subjects. METHODS: We employ expert knowledge as rules to exclude obvious control patients and machine learning to increase accuracy for complicated patients. We developed phenotyping algorithms on the basis of our framework and performed binary classification to determine whether a patient has T2DM. To facilitate development of practical phenotyping algorithms, this study introduces new evaluation metrics: area under the precision-sensitivity curve (AUPS) with a high sensitivity and AUPS with a high positive predictive value. RESULTS: The proposed phenotyping algorithms based on our framework show higher performance than baseline algorithms. Our proposed framework can be used to develop 2 types of phenotyping algorithms depending on the tuning approach: one for screening, the other for identifying research subjects. CONCLUSIONS: We develop a novel phenotyping framework that can be easily implemented on the basis of proper evaluation metrics, which are in accordance with users' objectives. The phenotyping algorithms based on our framework are useful for extraction of T2DM patients in retrospective studies.


Assuntos
Diabetes Mellitus Tipo 2/diagnóstico , Registros Eletrônicos de Saúde , Máquina de Vetores de Suporte , Área Sob a Curva , Humanos , Fenótipo , Curva ROC , Sensibilidade e Especificidade
17.
Stud Health Technol Inform ; 245: 432-436, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29295131

RESUMO

Phenotyping is an automated technique for identifying patients diagnosed with a particular disease based on electronic health records (EHRs). To evaluate phenotyping algorithms, which should be reproducible, the annotation of EHRs as a gold standard is critical. However, we have found that the different types of EHRs cannot be definitively annotated into CASEs or CONTROLs. The influence of such "possible patients" on phenotyping algorithms is unknown. To assess these issues, for four chronic diseases, we annotated EHRs by using information not directly referring to the diseases and developed two types of phenotyping algorithms for each disease. We confirmed that each disease included different types of possible patients. The performance of phenotyping algorithms differed depending on whether possible patients were considered as CASEs, and this was independent of the type of algorithms. Our results indicate that researchers must share annotation criteria for classifying the possible patients to reproduce phenotyping algorithms.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Fenótipo , Humanos
18.
Stud Health Technol Inform ; 245: 882-886, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29295226

RESUMO

To estimate a diagnostic probability similarly to experts using answers to interviews, we developed a system that fundamentally behaves as a Bayesian model. For predefined interviews, we defined the sensitivity and specificity related to one or more diagnoses. Additionally, we used a predefined parent-child relation between diagnoses to decrease the number of parameters to set. After calculating the disease probability, we trained the model using the difference of post-test probability between computer calculations and three experts' opinions. We evaluated the effects of setting up tree structures. When using a tree structure, the model trained faster and produced better fitting results than the model without tree structure. Training with multiple raters' training data confused the model. The scores worsened in later epochs. Herein, we present the new method's benefits and characteristics.


Assuntos
Teorema de Bayes , Técnicas de Apoio para a Decisão , Prova Pericial , Humanos , Probabilidade , Sensibilidade e Especificidade
19.
JMIR Med Inform ; 4(2): e12, 2016 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-27050304

RESUMO

BACKGROUND: Health level seven version 2.5 (HL7 v2.5) is a widespread messaging standard for information exchange between clinical information systems. By applying Semantic Web technologies for handling HL7 v2.5 messages, it is possible to integrate large-scale clinical data with life science knowledge resources. OBJECTIVE: Showing feasibility of a querying method over large-scale resource description framework (RDF)-ized HL7 v2.5 messages using publicly available drug databases. METHODS: We developed a method to convert HL7 v2.5 messages into the RDF. We also converted five kinds of drug databases into RDF and provided explicit links between the corresponding items among them. With those linked drug data, we then developed a method for query expansion to search the clinical data using semantic information on drug classes along with four types of temporal patterns. For evaluation purpose, medication orders and laboratory test results for a 3-year period at the University of Tokyo Hospital were used, and the query execution times were measured. RESULTS: Approximately 650 million RDF triples for medication orders and 790 million RDF triples for laboratory test results were converted. Taking three types of query in use cases for detecting adverse events of drugs as an example, we confirmed these queries were represented in SPARQL Protocol and RDF Query Language (SPARQL) using our methods and comparison with conventional query expressions were performed. The measurement results confirm that the query time is feasible and increases logarithmically or linearly with the amount of data and without diverging. CONCLUSIONS: The proposed methods enabled query expressions that separate knowledge resources and clinical data, thereby suggesting the feasibility for improving the usability of clinical data by enhancing the knowledge resources. We also demonstrate that when HL7 v2.5 messages are automatically converted into RDF, searches are still possible through SPARQL without modifying the structure. As such, the proposed method benefits not only our hospitals, but also numerous hospitals that handle HL7 v2.5 messages. Our approach highlights a potential of large-scale data federation techniques to retrieve clinical information, which could be applied as applications of clinical intelligence to improve clinical practices, such as adverse drug event monitoring and cohort selection for a clinical study as well as discovering new knowledge from clinical information.

20.
Mol Clin Oncol ; 3(5): 1053-1057, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26623049

RESUMO

A number of previous studies have reported that 30-50% of patients with colorectal cancer (CRC) harbor Kirsten rat sarcoma viral oncogene homolog (KRAS) mutations, which is a major predictive biomarker of resistance to epidermal growth factor (EGFR)-targeted therapy. Treatment with an anti-EGFR inhibitor is recommended for patients with KRAS wild-type metastatic colorectal cancer (mCRC). A recent retrospective study of cetuximab reported that patients with KRAS p.G13D mutations had better outcomes compared with those with other mutations. The aim of this retrospective study was to assess the prevalence of KRAS p.G13D mutations and evaluate the effectiveness of cetuximab in mCRC patients with KRAS p.G13D or other KRAS mutations. We reviewed the clinical records of 98 mCRC patients with KRAS mutations who were treated between August, 2004 and January, 2011 in four hospitals located in Tokyo and Kyushu Island. We also investigated KRAS mutation subtypes and patient characteristics. In the patients who received cetuximab, univariate and multivariate analyses were performed to assess the effect of KRAS p.G13D mutations on progression-free survival (PFS) and overall survival (OS). Of the 98 patients, 23 (23.5%) had KRAS p.G13D-mutated tumors, whereas 75 (76.5%) had tumors harboring other mutations. Of the 31 patients who received cetuximab, 9 (29.0%) had KRAS p.G13D mutations and 22 (71.0%) had other mutations. There were no significant differences in age, gender, primary site, pathological type, history of chemotherapy, or the combined use of irinotecan between either of the patient subgroups. The univariate analysis revealed no significant difference in PFS or OS between the patients with KRAS p.G13D mutations and those with other mutations (median PFS, 4.5 vs. 2.8 months, respectively; P=0.65; and median OS, 15.3 vs. 8.9 months, respectively; P=0.51). However, the multivariate analysis revealed a trend toward better PFS among patients harboring p.G13D mutations (PFS: HR=0.29; 95% CI: 0.08-1.10; P=0.07; OS: HR=0.23; 95% CI: 0.04-1.54; P=0.13). In conclusion, treatment with cetuximab may be more clinically beneficial in mCRC patients with a KRAS p.G13D mutation, compared with those harboring other mutations. However, further investigation is required to clearly determine the benefits of cetuximab treatment in patients with KRAS p.G13D mutation-positive mCRC.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA