Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Biomed Inform ; 106: 103426, 2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-32339747

RESUMEN

With the rise of deep learning, several recent studies on deep learning-based methods for electronic health records (EHR) successfully address real-world clinical challenges by utilizing effective representations of medical entities. However, existing EHR representation learning methods that focus on only diagnosis codes have limited clinical value, because such structured codes cannot concretely describe patients' medical conditions, and furthermore, some of the codes assigned to patients contain errors and inconsistency; this is one of the well-known caveats in the EHR. To overcome this limitation, in this paper, we fuse more detailed and accurate information in the form of natural language provided by unstructured clinical data sources (i.e., clinical notes). We propose HORDE, a unified graph representation learning framework to embed heterogeneous medical entities into a harmonized space for further downstream analyses as well as robustness to inconsistency in structured codes. Our extensive experiments demonstrate that HORDE significantly improves the performances of conventional clinical tasks such as subsequent code prediction and patient severity classification compared to existing methods, and also show the promising results of a novel EHR analysis about the consistency of each diagnosis code assignment.


Asunto(s)
Registros Electrónicos de Salud , Aprendizaje Automático , Humanos
2.
Bioinformatics ; 31(22): 3653-9, 2015 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-26209432

RESUMEN

MOTIVATION: As the quantity of genomic mutation data increases, the likelihood of finding patients with similar genomic profiles, for various disease inferences, increases. However, so does the difficulty in identifying them. Similarity search based on patient mutation profiles can solve various translational bioinformatics tasks, including prognostics and treatment efficacy predictions for better clinical decision making through large volume of data. However, this is a challenging problem due to heterogeneous and sparse characteristics of the mutation data as well as their high dimensionality. RESULTS: To solve this problem we introduce a compact representation and search strategy based on Gene-Ontology and orthogonal non-negative matrix factorization. Statistical significance between the identified cancer subtypes and their clinical features are computed for validation; results show that our method can identify and characterize clinically meaningful tumor subtypes comparable or better in most datasets than the recently introduced Network-Based Stratification method while enabling real-time search. To the best of our knowledge, this is the first attempt to simultaneously characterize and represent somatic mutational data for efficient search purposes. AVAILABILITY: The implementations are available at: https://sites.google.com/site/postechdm/research/implementation/orgos. CONTACT: sael@cs.stonybrook.edu or hwanjoyu@postech.ac.kr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Ontología de Genes , Mutación/genética , Distribución de Chi-Cuadrado , Humanos , Neoplasias/genética , Reproducibilidad de los Resultados , Análisis de Supervivencia
3.
BMC Med Inform Decis Mak ; 16 Suppl 1: 63, 2016 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-27453983

RESUMEN

BACKGROUND: Prostate specific antigen (PSA) is an important biomarker to monitor the response to the treatment, but has not been fully utilized as a whole sequence. We used a longitudinal biomarker PSA to discover a new prognostic pattern that predicts castration-resistant prostate cancer (CRPC) after androgen deprivation therapy. METHODS: We transformed the longitudinal PSA into a discrete sequence, used frequent sequential pattern mining to find candidate patterns from the sequences, and selected the most predictive and informative pattern among the candidates. RESULTS: Patients were less likely to be CRPC if, after PSA values reach nadir, the PSA decreases more than 0.048 ng/ml during a month, and the decrease occurs again. This pattern significantly increased the accuracy of predicting CRPC by supplementing information provided by existing PSA patterns such as pretreatment PSA. CONCLUSIONS: This result can help clinicians to stratify men by the risk of CRPC and to determine the patient that needs intensive follow-up.


Asunto(s)
Antagonistas de Andrógenos/uso terapéutico , Biomarcadores de Tumor/sangre , Minería de Datos/métodos , Antígeno Prostático Específico/sangre , Neoplasias de la Próstata Resistentes a la Castración/diagnóstico , Humanos , Masculino , Pronóstico
4.
BMC Med Inform Decis Mak ; 15 Suppl 1: S9, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26043907

RESUMEN

BACKGROUND: Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques. METHODS: This paper presents a hybrid dictionary-based bio-entity extraction technique. The approach expands the bio-entity dictionary by combining different data sources and improves the recall rate through the shortest path edit distance algorithm. In addition, the proposed technique adopts text mining techniques in the merging stage of similar entities such as Part of Speech (POS) expansion, stemming, and the exploitation of the contextual cues to further improve the performance. RESULTS: The experimental results show that the proposed technique achieves the best or at least equivalent performance among compared techniques, GENIA, MESH, UMLS, and combinations of these three resources in F-measure. CONCLUSIONS: The results imply that the performance of dictionary-based extraction techniques is largely influenced by information resources used to build the dictionary. In addition, the edit distance algorithm shows steady performance with three different dictionaries in precision whereas the context-only technique achieves a high-end performance with three difference dictionaries in recall.


Asunto(s)
Minería de Datos/métodos , Informática Médica/métodos , Vocabulario Controlado
5.
BMC Med Inform Decis Mak ; 13 Suppl 1: S8, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23691543

RESUMEN

Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively.


Asunto(s)
Indización y Redacción de Resúmenes/métodos , Biología Computacional , Almacenamiento y Recuperación de la Información , Estructura Terciaria de Proteína , Algoritmos , Biología Computacional/métodos , Biología Computacional/estadística & datos numéricos , Bases de Datos de Proteínas , Humanos , Imagenología Tridimensional/estadística & datos numéricos , Almacenamiento y Recuperación de la Información/estadística & datos numéricos , Modelos Moleculares , Conformación Proteica
7.
BMC Bioinformatics ; 12 Suppl 12: S4, 2011 Nov 24.
Artículo en Inglés | MEDLINE | ID: mdl-22168401

RESUMEN

BACKGROUND: Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. METHODS: We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. RESULTS: By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. CONCLUSIONS: Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.


Asunto(s)
Inteligencia Artificial , Procesamiento de Lenguaje Natural , Proteínas/metabolismo , Humanos , MEDLINE , Mapas de Interacción de Proteínas , Proteínas/química , Semántica , Estados Unidos
8.
BMC Bioinformatics ; 12 Suppl 2: S6, 2011 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-21489225

RESUMEN

BACKGROUND: As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. RESULTS: In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. CONCLUSIONS: Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Factuales , Almacenamiento y Recuperación de la Información/métodos , Programas Informáticos , Algoritmos , Internet , Bases del Conocimiento , Lenguajes de Programación , Semántica
9.
Diagn Pathol ; 16(1): 19, 2021 Mar 11.
Artículo en Inglés | MEDLINE | ID: mdl-33706755

RESUMEN

BACKGROUND: Immunohistochemistry (IHC) remains the gold standard for the diagnosis of pathological diseases. This technique has been supporting pathologists in making precise decisions regarding differential diagnosis and subtyping, and in creating personalized treatment plans. However, the interpretation of IHC results presents challenges in complicated cases. Furthermore, rapidly increasing amounts of IHC data are making it even harder for pathologists to reach to definitive conclusions. METHODS: We developed ImmunoGenius, a machine-learning-based expert system for the pathologist, to support the diagnosis of tumors of unknown origin. Based on Bayesian theorem, the most probable diagnoses can be drawn by calculating the probabilities of the IHC results in each disease. We prepared IHC profile data of 584 antibodies in 2009 neoplasms based on the relevant textbooks. We developed the reactive native mobile application for iOS and Android platform that can provide 10 most possible differential diagnoses based on the IHC input. RESULTS: We trained the software using 562 real case data, validated it with 382 case data, tested it with 164 case data and compared the precision hit rate. Precision hit rate was 78.5, 78.0 and 89.0% in training, validation and test dataset respectively. Which showed no significant difference. The main reason for discordant precision was lack of disease-specific IHC markers and overlapping IHC profiles observed in similar diseases. CONCLUSION: The results of this study showed a potential that the machine-learning algorithm based expert system can support the pathologic diagnosis by providing second opinion on IHC interpretation based on IHC database. Incorporation with contextual data including the clinical and histological findings might be required to elaborate the system in the future.


Asunto(s)
Inmunohistoquímica , Aprendizaje Automático , Neoplasias/diagnóstico , Neoplasias/patología , Algoritmos , Teorema de Bayes , Sistemas Especialistas , Humanos , Inmunohistoquímica/métodos , Neoplasias/metabolismo , Programas Informáticos
10.
BMC Bioinformatics ; 11 Suppl 2: S6, 2010 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-20406504

RESUMEN

BACKGROUND: Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. RESULTS: RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. CONCLUSIONS: RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user's feedback and efficiently processes the function to return relevant articles in real time.


Asunto(s)
Algoritmos , Inteligencia Artificial , Biología Computacional/métodos , Sistemas de Administración de Bases de Datos , PubMed , Interpretación Estadística de Datos , Retroalimentación , Reproducibilidad de los Resultados , Interfaz Usuario-Computador
11.
J Pathol Transl Med ; 54(6): 462-470, 2020 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-32854491

RESUMEN

BACKGROUND: Immunohistochemistry (IHC) has played an essential role in the diagnosis of hematolymphoid neoplasms. However, IHC interpretations can be challenging in daily practice, and exponentially expanding volumes of IHC data are making the task increasingly difficult. We therefore developed a machine-learning expert-supporting system for diagnosing lymphoid neoplasms. METHODS: A probabilistic decision-tree algorithm based on the Bayesian theorem was used to develop mobile application software for iOS and Android platforms. We tested the software with real data from 602 training and 392 validation cases of lymphoid neoplasms and compared the precision hit rates between the training and validation datasets. RESULTS: IHC expression data for 150 lymphoid neoplasms and 584 antibodies was gathered. The precision hit rates of 94.7% in the training data and 95.7% in the validation data for lymphomas were not statistically significant. Results in most B-cell lymphomas were excellent, and generally equivalent performance was seen in T-cell lymphomas. The primary reasons for lack of precision were atypical IHC profiles for certain cases (e.g., CD15-negative Hodgkin lymphoma), a lack of disease-specific markers, and overlapping IHC profiles of similar diseases. CONCLUSIONS: Application of the machine-learning algorithm to diagnosis precision produced acceptable hit rates in training and validation datasets. Because of the lack of origin- or disease-specific markers in differential diagnosis, contextual information such as clinical and histological features should be taken into account to make proper use of this system in the pathologic decision-making process.

12.
J Am Med Inform Assoc ; 27(9): 1411-1419, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32989459

RESUMEN

OBJECTIVE: Recent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to address significant privacy issues surrounding the EHR. However, most of them only focus on structured records about patients' independent visits, rather than on chronological clinical records. In this article, we aim to learn and synthesize realistic sequences of EHRs based on the generative autoencoder. MATERIALS AND METHODS: We propose a dual adversarial autoencoder (DAAE), which learns set-valued sequences of medical entities, by combining a recurrent autoencoder with 2 generative adversarial networks (GANs). DAAE improves the mode coverage and quality of generated sequences by adversarially learning both the continuous latent distribution and the discrete data distribution. Using the MIMIC-III (Medical Information Mart for Intensive Care-III) and UT Physicians clinical databases, we evaluated the performances of DAAE in terms of predictive modeling, plausibility, and privacy preservation. RESULTS: Our generated sequences of EHRs showed the comparable performances to real data for a predictive modeling task, and achieved the best score in plausibility evaluation conducted by medical experts among all baseline models. In addition, differentially private optimization of our model enables to generate synthetic sequences without increasing the privacy leakage of patients' data. CONCLUSIONS: DAAE can effectively synthesize sequential EHRs by addressing its main challenges: the synthetic records should be realistic enough not to be distinguished from the real records, and they should cover all the training patients to reproduce the performance of specific downstream tasks.


Asunto(s)
Simulación por Computador , Registros Electrónicos de Salud , Redes Neurales de la Computación , Confidencialidad , Humanos , Aprendizaje Automático , Programas Informáticos
13.
Artif Intell Med ; 42(1): 37-53, 2008 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17997291

RESUMEN

OBJECTIVE: Diabetic nephropathy is damage to the kidney caused by diabetes mellitus. It is a common complication and a leading cause of death in people with diabetes. However, the decline in kidney function varies considerably between patients and the determinants of diabetic nephropathy have not been clearly identified. Therefore, it is very difficult to predict the onset of diabetic nephropathy accurately with simple statistical approaches such as t-test or chi(2)-test. To accurately predict the onset of diabetic nephropathy, we applied various machine learning techniques to irregular and unbalanced diabetes dataset, such as support vector machine (SVM) classification and feature selection methods. Visualization of the risk factors was another important objective to give physicians intuitive information on each patient's clinical pattern. METHODS AND MATERIALS: We collected medical data from 292 patients with diabetes and performed preprocessing to extract 184 features from the irregular data. To predict the onset of diabetic nephropathy, we compared several classification methods such as logistic regression, SVM, and SVM with a cost sensitive learning method. We also applied several feature selection methods to remove redundant features and improve the classification performance. For risk factor analysis with SVM classifiers, we have developed a new visualization system which uses a nomogram approach. RESULTS: Linear SVM classifiers combined with wrapper or embedded feature selection methods showed the best results. Among the 184 features, the classifiers selected the same 39 features and gave 0.969 of the area under the curve by receiver operating characteristics analysis. The visualization tool was able to present the effect of each feature on the decision via graphical output. CONCLUSIONS: Our proposed method can predict the onset of diabetic nephropathy about 2-3 months before the actual diagnosis with high prediction performance from an irregular and unbalanced dataset, which statistical methods such as t-test and logistic regression could not achieve. Additionally, the visualization system provides physicians with intuitive information for risk factor analysis. Therefore, physicians can benefit from the automatic early warning of each patient and visualize risk factors, which facilitate planning of effective and proper treatment strategies.


Asunto(s)
Sistemas de Apoyo a Decisiones Clínicas , Nefropatías Diabéticas/diagnóstico , Diagnóstico por Computador , Algoritmos , Inteligencia Artificial , Humanos , Modelos Logísticos , Redes Neurales de la Computación , Valor Predictivo de las Pruebas , Curva ROC , Factores de Riesgo
14.
IEEE Trans Inf Technol Biomed ; 12(2): 247-56, 2008 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-18348954

RESUMEN

Nonlinear classifiers, e.g., support vector machines (SVMs) with radial basis function (RBF) kernels, have been used widely for automatic diagnosis of diseases because of their high accuracies. However, it is difficult to visualize the classifiers, and thus difficult to provide intuitive interpretation of results to physicians. We developed a new nonlinear kernel, the localized radial basis function (LRBF) kernel, and new visualization system visualization for risk factor analysis (VRIFA) that applies a nomogram and LRBF kernel to visualize the results of nonlinear SVMs and improve the interpretability of results while maintaining high prediction accuracy. Three representative medical datasets from the University of California, Irvine repository and Statlog dataset-breast cancer, diabetes, and heart disease datasets-were used to evaluate the system. The results showed that the classification performance of the LRBF is comparable with that of the RBF, and the LRBF is easy to visualize via a nomogram. Our study also showed that the LRBF kernel is less sensitive to noise features than the RBF kernel, whereas the LRBF kernel degrades the prediction accuracy more when important features are eliminated. We demonstrated the VRIFA system, which visualizes the results of linear and nonlinear SVMs with LRBF kernels, on the three datasets.


Asunto(s)
Inteligencia Artificial , Diagnóstico por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Modelos de Riesgos Proporcionales , Medición de Riesgo/métodos , Interfaz Usuario-Computador , Gráficos por Computador , Dinámicas no Lineales , Pronóstico , Factores de Riesgo
15.
PLoS One ; 13(6): e0197518, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29897980

RESUMEN

Several studies have been conducted to evaluate the efficacy of statins in Korean and Asian patients. However, most previous studies only observed the percent reduction in low-density lipoprotein cholesterol (LDL-C) and did not consider the effects of various patient conditions simultaneously, such as abnormal test results, patient demographics, and prescribed drugs before taking a statin. Moreover, the characteristics of the patients whose percent reduction in LDL-C was higher than expected were not provided. Therefore, in this study, we aimed to derive meaningful phenotypes by using tensor factorization to observe the characteristics of the patients whose percent reduction in LDL-C was higher than expected among patients taking moderate-intensity statins. In addition, we used the derived phenotypes to predict how much the LDL-C levels of new patients decreased. We consequently identified eight phenotypes that represented the characteristics of the patients whose percent reduction in LDL-C was higher than expected. Moreover, the latent representations of the derived phenotypes achieved prediction performance similar to that obtained using the raw data. These results demonstrate that the derived phenotypes and latent representations are useful tools for observing the characteristics of patients and predicting LDL-C levels. Additionally, our findings provide direction on how to conduct clinical studies in the future.


Asunto(s)
LDL-Colesterol/sangre , Diabetes Mellitus/tratamiento farmacológico , Inhibidores de Hidroximetilglutaril-CoA Reductasas/uso terapéutico , Hipercolesterolemia/tratamiento farmacológico , Adulto , Anciano , Anciano de 80 o más Años , Glucemia , Diabetes Mellitus/epidemiología , Diabetes Mellitus/patología , Femenino , Geriatría , Hemoglobina Glucada/metabolismo , Humanos , Hipercolesterolemia/sangre , Hipercolesterolemia/patología , Masculino , Persona de Mediana Edad , Simvastatina/uso terapéutico
16.
KDD ; 2017: 887-895, 2017 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-29071165

RESUMEN

Tensor factorization models offer an effective approach to convert massive electronic health records into meaningful clinical concepts (phenotypes) for data analysis. These models need a large amount of diverse samples to avoid population bias. An open challenge is how to derive phenotypes jointly across multiple hospitals, in which direct patient-level data sharing is not possible (e.g., due to institutional policies). In this paper, we developed a novel solution to enable federated tensor factorization for computational phenotyping without sharing patient-level data. We developed secure data harmonization and federated computation procedures based on alternating direction method of multipliers (ADMM). Using this method, the multiple hospitals iteratively update tensors and transfer secure summarized information to a central server, and the server aggregates the information to generate phenotypes. We demonstrated with real medical datasets that our method resembles the centralized training model (based on combined datasets) in terms of accuracy and phenotypes discovery while respecting privacy.

17.
Sci Rep ; 7(1): 1114, 2017 04 25.
Artículo en Inglés | MEDLINE | ID: mdl-28442772

RESUMEN

Adoption of Electronic Health Record (EHR) systems has led to collection of massive healthcare data, which creates oppor- tunities and challenges to study them. Computational phenotyping offers a promising way to convert the sparse and complex data into meaningful concepts that are interpretable to healthcare givers to make use of them. We propose a novel su- pervised nonnegative tensor factorization methodology that derives discriminative and distinct phenotypes. We represented co-occurrence of diagnoses and prescriptions in EHRs as a third-order tensor, and decomposed it using the CP algorithm. We evaluated discriminative power of our models with an Intensive Care Unit database (MIMIC-III) and demonstrated superior performance than state-of-the-art ICU mortality calculators (e.g., APACHE II, SAPS II). Example of the resulted phenotypes are sepsis with acute kidney injury, cardiac surgery, anemia, respiratory failure, heart failure, cardiac arrest, metastatic cancer (requiring ICU), end-stage dementia (requiring ICU and transitioned to comfort-care), intraabdominal conditions, and alcohol abuse/withdrawal.


Asunto(s)
Simulación por Computador , Técnicas de Apoyo para la Decisión , Registros Electrónicos de Salud , Fenotipo , Humanos , Modelos Estadísticos
18.
PLoS One ; 12(6): e0177629, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28636614

RESUMEN

Excessive smartphone use causes personal and social problems. To address this issue, we sought to derive usage patterns that were directly correlated with smartphone dependence based on usage data. This study attempted to classify smartphone dependence using a data-driven prediction algorithm. We developed a mobile application to collect smartphone usage data. A total of 41,683 logs of 48 smartphone users were collected from March 8, 2015, to January 8, 2016. The participants were classified into the control group (SUC) or the addiction group (SUD) using the Korean Smartphone Addiction Proneness Scale for Adults (S-Scale) and a face-to-face offline interview by a psychiatrist and a clinical psychologist (SUC = 23 and SUD = 25). We derived usage patterns using tensor factorization and found the following six optimal usage patterns: 1) social networking services (SNS) during daytime, 2) web surfing, 3) SNS at night, 4) mobile shopping, 5) entertainment, and 6) gaming at night. The membership vectors of the six patterns obtained a significantly better prediction performance than the raw data. For all patterns, the usage times of the SUD were much longer than those of the SUC. From our findings, we concluded that usage patterns and membership vectors were effective tools to assess and predict smartphone dependence and could provide an intervention guideline to predict and treat smartphone dependence based on usage data.


Asunto(s)
Algoritmos , Conducta Adictiva/psicología , Análisis Factorial , Teléfono Inteligente/estadística & datos numéricos , Adulto , Femenino , Humanos , Masculino , Adulto Joven
19.
PLoS One ; 11(8): e0159788, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27533112

RESUMEN

The purpose of this study was to identify personality factor-associated predictors of smartphone addiction predisposition (SAP). Participants were 2,573 men and 2,281 women (n = 4,854) aged 20-49 years (Mean ± SD: 33.47 ± 7.52); participants completed the following questionnaires: the Korean Smartphone Addiction Proneness Scale (K-SAPS) for adults, the Behavioral Inhibition System/Behavioral Activation System questionnaire (BIS/BAS), the Dickman Dysfunctional Impulsivity Instrument (DDII), and the Brief Self-Control Scale (BSCS). In addition, participants reported their demographic information and smartphone usage pattern (weekday or weekend average usage hours and main use). We analyzed the data in three steps: (1) identifying predictors with logistic regression, (2) deriving causal relationships between SAP and its predictors using a Bayesian belief network (BN), and (3) computing optimal cut-off points for the identified predictors using the Youden index. Identified predictors of SAP were as follows: gender (female), weekend average usage hours, and scores on BAS-Drive, BAS-Reward Responsiveness, DDII, and BSCS. Female gender and scores on BAS-Drive and BSCS directly increased SAP. BAS-Reward Responsiveness and DDII indirectly increased SAP. We found that SAP was defined with maximal sensitivity as follows: weekend average usage hours > 4.45, BAS-Drive > 10.0, BAS-Reward Responsiveness > 13.8, DDII > 4.5, and BSCS > 37.4. This study raises the possibility that personality factors contribute to SAP. And, we calculated cut-off points for key predictors. These findings may assist clinicians screening for SAP using cut-off points, and further the understanding of SA risk factors.


Asunto(s)
Conducta Compulsiva/psicología , Autocontrol/psicología , Teléfono Inteligente/estadística & datos numéricos , Adulto , Femenino , Humanos , Masculino , Persona de Mediana Edad , República de Corea , Encuestas y Cuestionarios , Adulto Joven
20.
Sci Rep ; 6: 25419, 2016 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-27146602

RESUMEN

To assess the impact of lymphovascular invasion (LVI) on the risk of biochemical recurrence (BCR) in pT3 N0 prostate cancer, clinical data were extracted from 1,622 patients with pT3 N0 prostate cancer from the K-CaP database. Patients with neoadjuvant androgen deprivation therapy (n = 325) or insufficient pathologic or follow-up data (n = 87) were excluded. The primary endpoint was the oncologic importance of LVI, and the secondary endpoint was the hierarchical relationships for estimating BCR between the evaluated variables. LVI was noted in 260 patients (21.5%) and was significantly associated with other adverse clinicopathologic features. In the multivariate Cox regression analysis, LVI was significantly associated with an increased risk of BCR after adjusting for known prognostic factors. In the Bayesian belief network analysis, LVI and pathologic Gleason score were found to be first-degree associates of BCR, whereas prostate-specific antigen (PSA) level, seminal vesicle invasion, perineural invasion, and high-grade prostatic intraepithelial neoplasia were considered second-degree associates. In the random survival forest, pathologic Gleason score, LVI, and PSA level were three most important variables in determining BCR of patients with pT3 N0 prostate cancer. In conclusion, LVI is one of the most powerful adverse prognostic factors for BCR in patients with pT3 N0 prostate cancer.


Asunto(s)
Recurrencia Local de Neoplasia/patología , Antígeno Prostático Específico/metabolismo , Neoplasias de la Próstata/patología , Neoplasias Vasculares/secundario , Anciano , Teorema de Bayes , Humanos , Metástasis Linfática , Masculino , Persona de Mediana Edad , Invasividad Neoplásica , Recurrencia Local de Neoplasia/metabolismo , Estadificación de Neoplasias , Pronóstico , Neoplasias de la Próstata/metabolismo , Análisis de Supervivencia , Neoplasias Vasculares/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA