Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
1.
Mem Cognit ; 51(3): 601-622, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36542319

RESUMEN

One of the central issues in cognition is identifying universal and culturally specific patterns of thought. In this study, we examined how one aspect of culture, a linguistic part of speech known asclassifiers, are related to categorization of solid objects. In Experiment 1, we used a numeral classifier elicitation task to examine the classifiers used by speakers of Hmong, Japanese, and Mandarin Chinese (N = 34) with 135 nouns that referred to solid objects. In Experiment 2, adult speakers of English, Japanese, Mandarin Chinese, and Hmong (N = 64) rated the similarity of 39 pictured objects that depicted a subset of the nouns. All groups classified the objects into natural kinds and artifacts, with the category of humans anchoring both divisions. The main difference that emerged from the study was that speakers of Japanese and English rated humans and animals as more similar to each other than Hmong speakers; Mandarin speakers' ratings of the similarity between humans and animals fell in between those of Hmong and English speakers. However, the pattern of categorization of humans and animals found among speakers of the classifier languages contradicted their patterns of classifier use. The findings help to tease apart the effects of language from other cultural factors that impact cognition.


Asunto(s)
Comparación Transcultural , Lenguaje , Adulto , Humanos , Cognición , Habla
2.
IEEE Trans Nanobioscience ; 18(3): 324-334, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-30951476

RESUMEN

Serendipitous drug usage refers to the unexpected relief of comorbid diseases or symptoms when taking medication for a different known indication. Historically, serendipity has contributed significantly to identifying many new drug indications. If patient-reported serendipitous drug usage in social media could be computationally identified, it could help generate and validate drug-repositioning hypotheses. We investigated deep neural network models for mining serendipitous drug usage from social media. We used the word2vec algorithm to construct word-embedding features from drug reviews posted in a WebMD patient forum. We adapted and redesigned the convolutional neural network, long short-term memory network, and convolutional long short-term memory network by adding contextual information extracted from drug-review posts, information-filtering tools, medical ontology, and medical knowledge. We trained, tuned, and evaluated our models with a gold-standard dataset of 15714 sentences (447 [2.8%] describing serendipitous drug usage). Additionally, we compared our deep neural networks to support vector machine, random forest, and AdaBoost.M1 algorithms. Context information helped to reduce the false-positive rate of deep neural network models. If we used an extremely imbalanced dataset with limited instances of serendipitous drug usage, deep neural network models did not outperform other machine-learning models with n-gram and context features. However, deep neural network models could more effectively use word embedding in feature construction, an advantage that makes them worthy of further investigation. Finally, we implemented natural-language processing and machine-learning methods in a web-based application to help scientists and software developers mine social media for serendipitous drug usage.


Asunto(s)
Minería de Datos/métodos , Reposicionamiento de Medicamentos/métodos , Aprendizaje Automático , Informática Médica/métodos , Medios de Comunicación Sociales , Algoritmos , Descubrimiento de Drogas , Humanos
3.
IEEE Trans Nanobioscience ; 17(3): 165-171, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29993581

RESUMEN

Data mapping plays an important role in data integration and exchanges among institutions and organizations with different data standards. However, traditional rule-based approaches and machine learning methods fail to achieve satisfactory results for the data mapping problem. In this paper, we propose a novel and sophisticated deep learning framework for data mapping called mixture feature embedding convolutional neural network (MfeCNN). The MfeCNN model converts the data mapping task to a multiple classification problem. In the model, we incorporated multimodal learning and multiview embedding into a CNN for mixture feature tensor generation and classification prediction. Multimodal features were extracted from various linguistic spaces with a medical natural language processing package. Then, powerful feature embeddings were learned by using the CNN. As many as 10 classes could be simultaneously classified by a softmax prediction layer based on multiview embedding. MfeCNN achieved the best results on unbalanced data (average F1 score, 82.4%) among the traditional state-of-the-art machine learning models and CNN without mixture feature embedding. Our model also outperformed a very deep CNN with 29 layers, which took free texts as inputs. The combination of mixture feature embedding and a deep neural network can achieve high accuracy for data mapping and multiple classification.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Profundo , Redes Neurales de la Computación , Minería de Datos , Humanos , Procesamiento de Lenguaje Natural , Flujo de Trabajo
4.
PLoS One ; 13(1): e0191568, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29373609

RESUMEN

Recent scientific advances have accumulated a tremendous amount of biomedical knowledge providing novel insights into the relationship between molecular and cellular processes and diseases. Literature mining is one of the commonly used methods to retrieve and extract information from scientific publications for understanding these associations. However, due to large data volume and complicated associations with noises, the interpretability of such association data for semantic knowledge discovery is challenging. In this study, we describe an integrative computational framework aiming to expedite the discovery of latent disease mechanisms by dissecting 146,245 disease-gene associations from over 25 million of PubMed indexed articles. We take advantage of both Latent Dirichlet Allocation (LDA) modeling and network-based analysis for their capabilities of detecting latent associations and reducing noises for large volume data respectively. Our results demonstrate that (1) the LDA-based modeling is able to group similar diseases into disease topics; (2) the disease-specific association networks follow the scale-free network property; (3) certain subnetwork patterns were enriched in the disease-specific association networks; and (4) genes were enriched in topic-specific biological processes. Our approach offers promising opportunities for latent disease-gene knowledge discovery in biomedical research.


Asunto(s)
Predisposición Genética a la Enfermedad , PubMed , Minería de Datos , Humanos
5.
AMIA Jt Summits Transl Sci Proc ; 2016: 428-37, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27595047

RESUMEN

It is widely acknowledged that natural language processing is indispensable to process electronic health records (EHRs). However, poor performance in relation detection tasks, such as coreference (linguistic expressions pertaining to the same entity/event) may affect the quality of EHR processing. Hence, there is a critical need to advance the research for relation detection from EHRs. Most of the clinical coreference resolution systems are based on either supervised machine learning or rule-based methods. The need for manually annotated corpus hampers the use of such system in large scale. In this paper, we present an infinite mixture model method using definite sampling to resolve coreferent relations among mentions in clinical notes. A similarity measure function is proposed to determine the coreferent relations. Our system achieved a 0.847 F-measure for i2b2 2011 coreference corpus. This promising results and the unsupervised nature make it possible to apply the system in big-data clinical setting.

6.
J Biomed Inform ; 63: 379-389, 2016 10.
Artículo en Inglés | MEDLINE | ID: mdl-27593166

RESUMEN

In the era of digitalization, information retrieval (IR), which retrieves and ranks documents from large collections according to users' search queries, has been popularly applied in the biomedical domain. Building patient cohorts using electronic health records (EHRs) and searching literature for topics of interest are some IR use cases. Meanwhile, natural language processing (NLP), such as tokenization or Part-Of-Speech (POS) tagging, has been developed for processing clinical documents or biomedical literature. We hypothesize that NLP can be incorporated into IR to strengthen the conventional IR models. In this study, we propose two NLP-empowered IR models, POS-BoW and POS-MRF, which incorporate automatic POS-based term weighting schemes into bag-of-word (BoW) and Markov Random Field (MRF) IR models, respectively. In the proposed models, the POS-based term weights are iteratively calculated by utilizing a cyclic coordinate method where golden section line search algorithm is applied along each coordinate to optimize the objective function defined by mean average precision (MAP). In the empirical experiments, we used the data sets from the Medical Records track in Text REtrieval Conference (TREC) 2011 and 2012 and the Genomics track in TREC 2004. The evaluation on TREC 2011 and 2012 Medical Records tracks shows that, for the POS-BoW models, the mean improvement rates for IR evaluation metrics, MAP, bpref, and P@10, are 10.88%, 4.54%, and 3.82%, compared to the BoW models; and for the POS-MRF models, these rates are 13.59%, 8.20%, and 8.78%, compared to the MRF models. Additionally, we experimentally verify that the proposed weighting approach is superior to the simple heuristic and frequency based weighting approaches, and validate our POS category selection. Using the optimal weights calculated in this experiment, we tested the proposed models on the TREC 2004 Genomics track and obtained average of 8.63% and 10.04% improvement rates for POS-BoW and POS-MRF, respectively. These significant improvements verify the effectiveness of leveraging POS tagging for biomedical IR tasks.


Asunto(s)
Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural , Algoritmos , Humanos , Lingüística
7.
Biomed Inform Insights ; 8(Suppl 1): 13-22, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27385912

RESUMEN

The concept of optimizing health care by understanding and generating knowledge from previous evidence, ie, the Learning Health-care System (LHS), has gained momentum and now has national prominence. Meanwhile, the rapid adoption of electronic health records (EHRs) enables the data collection required to form the basis for facilitating LHS. A prerequisite for using EHR data within the LHS is an infrastructure that enables access to EHR data longitudinally for health-care analytics and real time for knowledge delivery. Additionally, significant clinical information is embedded in the free text, making natural language processing (NLP) an essential component in implementing an LHS. Herein, we share our institutional implementation of a big data-empowered clinical NLP infrastructure, which not only enables health-care analytics but also has real-time NLP processing capability. The infrastructure has been utilized for multiple institutional projects including the MayoExpertAdvisor, an individualized care recommendation solution for clinical care. We compared the advantages of big data over two other environments. Big data infrastructure significantly outperformed other infrastructure in terms of computing speed, demonstrating its value in making the LHS a possibility in the near future.

8.
J Biomed Inform ; 63: 11-21, 2016 10.
Artículo en Inglés | MEDLINE | ID: mdl-27444185

RESUMEN

BACKGROUND: Constructing standard and computable clinical diagnostic criteria is an important but challenging research field in the clinical informatics community. The Quality Data Model (QDM) is emerging as a promising information model for standardizing clinical diagnostic criteria. OBJECTIVE: To develop and evaluate automated methods for converting textual clinical diagnostic criteria in a structured format using QDM. METHODS: We used a clinical Natural Language Processing (NLP) tool known as cTAKES to detect sentences and annotate events in diagnostic criteria. We developed a rule-based approach for assigning the QDM datatype(s) to an individual criterion, whereas we invoked a machine learning algorithm based on the Conditional Random Fields (CRFs) for annotating attributes belonging to each particular QDM datatype. We manually developed an annotated corpus as the gold standard and used standard measures (precision, recall and f-measure) for the performance evaluation. RESULTS: We harvested 267 individual criteria with the datatypes of Symptom and Laboratory Test from 63 textual diagnostic criteria. We manually annotated attributes and values in 142 individual Laboratory Test criteria. The average performance of our rule-based approach was 0.84 of precision, 0.86 of recall, and 0.85 of f-measure; the performance of CRFs-based classification was 0.95 of precision, 0.88 of recall and 0.91 of f-measure. We also implemented a web-based tool that automatically translates textual Laboratory Test criteria into the QDM XML template format. The results indicated that our approaches leveraging cTAKES and CRFs are effective in facilitating diagnostic criteria annotation and classification. CONCLUSION: Our NLP-based computational framework is a feasible and useful solution in developing diagnostic criteria representation and computerization.


Asunto(s)
Algoritmos , Exactitud de los Datos , Diagnóstico por Computador , Procesamiento de Lenguaje Natural , Humanos , Aprendizaje Automático
10.
Sci Rep ; 6: 24955, 2016 04 22.
Artículo en Inglés | MEDLINE | ID: mdl-27102014

RESUMEN

Increasing evidence has shown that sex differences exist in Adverse Drug Events (ADEs). Identifying those sex differences in ADEs could reduce the experience of ADEs for patients and could be conducive to the development of personalized medicine. In this study, we analyzed a normalized US Food and Drug Administration Adverse Event Reporting System (FAERS). Chi-squared test was conducted to discover which treatment regimens or drugs had sex differences in adverse events. Moreover, reporting odds ratio (ROR) and P value were calculated to quantify the signals of sex differences for specific drug-event combinations. Logistic regression was applied to remove the confounding effect from the baseline sex difference of the events. We detected among 668 drugs of the most frequent 20 treatment regimens in the United States, 307 drugs have sex differences in ADEs. In addition, we identified 736 unique drug-event combinations with significant sex differences. After removing the confounding effect from the baseline sex difference of the events, there are 266 combinations remained. Drug labels or previous studies verified some of them while others warrant further investigation.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/epidemiología , Humanos , Factores Sexuales , Estados Unidos
11.
Oncotarget ; 7(14): 17468-78, 2016 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-27003359

RESUMEN

Gut microbiota plays a dual role in chronic kidney disease (CKD) and is closely linked to production of uremic toxins. Strategies of reducing uremic toxins by targeting gut microbiota are emerging. It is known that Chinese medicine rhubarb enema can reduce uremic toxins and improve renal function. However, it remains unknown which ingredient or mechanism mediates its effect. Here we utilized a rat CKD model of 5/6 nephrectomy to evaluate the effect of emodin, a main ingredient of rhubarb, on gut microbiota and uremic toxins in CKD. Emodin was administered via colonic irrigation at 5ml (1mg/day) for four weeks. We found that emodin via colonic irrigation (ECI) altered levels of two important uremic toxins, urea and indoxyl sulfate (IS), and changed gut microbiota in rats with CKD. ECI remarkably reduced urea and IS and improved renal function. Pyrosequencing and Real-Time qPCR analyses revealed that ECI resumed the microbial balance from an abnormal status in CKD. We also demonstrated that ten genera were positively correlated with Urea while four genera exhibited the negative correlation. Moreover, three genera were positively correlated with IS. Therefore, emodin altered the gut microbiota structure. It reduced the number of harmful bacteria, such as Clostridium spp. that is positively correlated with both urea and IS, but augmented the number of beneficial bacteria, including Lactobacillus spp. that is negatively correlated with urea. Thus, changes in gut microbiota induced by emodin via colonic irrigation are closely associated with reduction in uremic toxins and mitigation of renal injury.


Asunto(s)
Emodina/administración & dosificación , Microbioma Gastrointestinal/efectos de los fármacos , Indicán/sangre , Microbiota/efectos de los fármacos , Insuficiencia Renal Crónica/tratamiento farmacológico , Insuficiencia Renal Crónica/microbiología , Urea/sangre , Animales , Masculino , Distribución Aleatoria , Ratas , Ratas Sprague-Dawley , Insuficiencia Renal Crónica/sangre , Irrigación Terapéutica/métodos
12.
AMIA Annu Symp Proc ; 2016: 789-798, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-28269875

RESUMEN

Classification of drug-drug interaction (DDI) from medical literatures is significant in preventing medication-related errors. Most of the existing machine learning approaches are based on supervised learning methods. However, the dynamic nature of drug knowledge, combined with the enormity and rapidly growing of the biomedical literatures make supervised DDI classification methods easily overfit the corpora and may not meet the needs of real-world applications. In this paper, we proposed a relation classification framework based on topic modeling (RelTM) augmented with distant supervision for the task of DDI from biomedical text. The uniqueness of RelTM lies in its two-level sampling from both DDI and drug entities. Through this design, RelTM take both relation features and drug mention features into considerations. An efficient inference algorithm for the model using Gibbs sampling is also proposed. Compared to the previous supervised models, our approach does not require human efforts such as annotation and labeling, which is its advantage in trending big data applications. Meanwhile, the distant supervision combination allows RelTM to incorporate rich existing knowledge resources provided by domain experts. The experimental results on the 2013 DDI challenge corpus reach 48% in F1 score, showing the effectiveness of RelTM.


Asunto(s)
Algoritmos , Interacciones Farmacológicas , Aprendizaje Automático , Humanos , Almacenamiento y Recuperación de la Información/métodos
13.
Am J Inf Manag ; 1(1): 1-9, 2016 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-29071308

RESUMEN

Systematic reviews (SRs) involve the identification, appraisal, and synthesis of all relevant studies for focused questions in a structured reproducible manner. High-quality SRs follow strict procedures and require significant resources and time. We investigated advanced text-mining approaches to reduce the burden associated with abstract screening in SRs and provide high-level information summary. A text-mining SR supporting framework consisting of three self-defined semantics-based ranking metrics was proposed, including keyword relevance, indexed-term relevance and topic relevance. Keyword relevance is based on the user-defined keyword list used in the search strategy. Indexed-term relevance is derived from indexed vocabulary developed by domain experts used for indexing journal articles and books. Topic relevance is defined as the semantic similarity among retrieved abstracts in terms of topics generated by latent Dirichlet allocation, a Bayesian-based model for discovering topics. We tested the proposed framework using three published SRs addressing a variety of topics (Mass Media Interventions, Rectal Cancer and Influenza Vaccine). The results showed that when 91.8%, 85.7%, and 49.3% of the abstract screening labor was saved, the recalls were as high as 100% for the three cases; respectively. Relevant studies identified manually showed strong topic similarity through topic analysis, which supported the inclusion of topic analysis as relevance metric. It was demonstrated that advanced text mining approaches can significantly reduce the abstract screening labor of SRs and provide an informative summary of relevant studies.

14.
Stud Health Technol Inform ; 216: 539-43, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26262109

RESUMEN

Relation extraction typically involves the extraction of relations between two or more entities occurring within a single or multiple sentences. In this study, we investigated the significance of extracting information from multiple sentences specifically in the context of drug-disease relation discovery. We used multiple resources such as Semantic Medline, a literature based resource, and Medline search (for filtering spurious results) and inferred 8,772 potential drug-disease pairs. Our analysis revealed that 6,450 (73.5%) of the 8,772 potential drug-disease relations did not occur in a single sentence. Moreover, only 537 of the drug-disease pairs matched the curated gold standard in Comparative Toxicogenomics Database (CTD), a trusted resource for drug-disease relations. Among the 537, nearly 75% (407) of the drug-disease pairs occur in multiple sentences. Our analysis revealed that the drug-disease pairs inferred from Semantic Medline or retrieved from CTD could be extracted from multiple sentences in the literature. This highlights the significance of the need of discourse-level analysis in extracting the relations from biomedical literature.


Asunto(s)
Minería de Datos/métodos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/clasificación , MEDLINE , Procesamiento de Lenguaje Natural , Publicaciones Periódicas como Asunto/clasificación , Semántica , Humanos , Aprendizaje Automático , Evaluación de Necesidades , Ciencia
15.
Stud Health Technol Inform ; 216: 604-8, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26262122

RESUMEN

In this study we have developed a rule-based natural language processing (NLP) system to identify patients with family history of pancreatic cancer. The algorithm was developed in a Unstructured Information Management Architecture (UIMA) framework and consisted of section segmentation, relation discovery, and negation detection. The system was evaluated on data from two institutions. The family history identification precision was consistent across the institutions shifting from 88.9% on Indiana University (IU) dataset to 87.8% on Mayo Clinic dataset. Customizing the algorithm on the the Mayo Clinic data, increased its precision to 88.1%. The family member relation discovery achieved precision, recall, and F-measure of 75.3%, 91.6% and 82.6% respectively. Negation detection resulted in precision of 99.1%. The results show that rule-based NLP approaches for specific information extraction tasks are portable across institutions; however customization of the algorithm on the new dataset improves its performance.


Asunto(s)
Registros Electrónicos de Salud/clasificación , Almacenamiento y Recuperación de la Información/métodos , Anamnesis/métodos , Procesamiento de Lenguaje Natural , Neoplasias Pancreáticas/diagnóstico , Neoplasias Pancreáticas/genética , Algoritmos , Predisposición Genética a la Enfermedad/epidemiología , Predisposición Genética a la Enfermedad/genética , Humanos , Anamnesis/estadística & datos numéricos , Registro Médico Coordinado , Neoplasias Pancreáticas/epidemiología
16.
Artículo en Inglés | MEDLINE | ID: mdl-26306249

RESUMEN

Bibliometric analysis is a research method used in library and information science to evaluate research performance. It applies quantitative and statistical analyses to describe patterns observed in a set of publications and can help identify previous, current, and future research trends or focus. To better guide our institutional strategic plan in cancer population science, we conducted bibliometric analysis on publications of investigators currently funded by either Division of Cancer Preventions (DCP) or Division of Cancer Control and Population Science (DCCPS) at National Cancer Institute. We applied two topic modeling techniques: author topic modeling (AT) and dynamic topic modeling (DTM). Our initial results show that AT can address reasonably the issues related to investigators' research interests, research topic distributions and popularities. In compensation, DTM can address the evolving trend of each topic by displaying the proportion changes of key words, which is consistent with the changes of MeSH headings.

17.
Artículo en Inglés | MEDLINE | ID: mdl-26306259

RESUMEN

Scientific literature is one of the popular resources for providing decision support at point of care. It is highly desirable to bring the most relevant literature to support the evidence-based clinical decision making process. Motivated by the recent advance in semantically enhanced information retrieval, we have developed a system, which aims to bring semantically enriched literature, Semantic Medline, to meet the information needs at point of care. This study reports our work towards operationalizing the system for real time use. We demonstrate that the migration of a relational database implementation to a NoSQL (Not only SQL) implementation significantly improves the performance and makes the use of Semantic Medline at point of care decision support possible.

18.
Stud Health Technol Inform ; 216: 1033-4, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26262333

RESUMEN

In clinical NLP, one major barrier to adopting crowdsourcing for NLP annotation is the issue of confidentiality for protected health information (PHI) in clinical narratives. In this paper, we investigated the use of a frequency-based approach to extract sentences without PHI. Our approach is based on the assumption that sentences appearing frequently tend to contain no PHI. Both manual and automatic evaluations on 500 sentences out of the 7.9 million sentences of frequencies higher than one show that no PHI can be found among them. The promising results provide potentials of releasing those sentences for obtaining sentence-level NLP annotations via crowdsourcing.


Asunto(s)
Colaboración de las Masas/métodos , Interpretación Estadística de Datos , Registros Electrónicos de Salud/clasificación , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Semántica , Lenguaje , Minnesota , Reconocimiento de Normas Patrones Automatizadas/métodos , Terminología como Asunto , Vocabulario Controlado
19.
BMC Bioinformatics ; 16: 185, 2015 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-26047637

RESUMEN

BACKGROUND: Advances in the next generation sequencing technology has accelerated the pace of individualized medicine (IM), which aims to incorporate genetic/genomic information into medicine. One immediate need in interpreting sequencing data is the assembly of information about genetic variants and their corresponding associations with other entities (e.g., diseases or medications). Even with dedicated effort to capture such information in biological databases, much of this information remains 'locked' in the unstructured text of biomedical publications. There is a substantial lag between the publication and the subsequent abstraction of such information into databases. Multiple text mining systems have been developed, but most of them focus on the sentence level association extraction with performance evaluation based on gold standard text annotations specifically prepared for text mining systems. RESULTS: We developed and evaluated a text mining system, MutD, which extracts protein mutation-disease associations from MEDLINE abstracts by incorporating discourse level analysis, using a benchmark data set extracted from curated database records. MutD achieves an F-measure of 64.3% for reconstructing protein mutation disease associations in curated database records. Discourse level analysis component of MutD contributed to a gain of more than 10% in F-measure when compared against the sentence level association extraction. Our error analysis indicates that 23 of the 64 precision errors are true associations that were not captured by database curators and 68 of the 113 recall errors are caused by the absence of associated disease entities in the abstract. After adjusting for the defects in the curated database, the revised F-measure of MutD in association detection reaches 81.5%. CONCLUSIONS: Our quantitative analysis reveals that MutD can effectively extract protein mutation disease associations when benchmarking based on curated database records. The analysis also demonstrates that incorporating discourse level analysis significantly improved the performance of extracting the protein-mutation-disease association. Future work includes the extension of MutD for full text articles.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Minería de Datos/métodos , Enfermedad/genética , Medical Subject Headings , Mutación/genética , Publicaciones , Bases de Datos Factuales , Variación Genética/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Procesamiento de Lenguaje Natural
20.
BioData Min ; 8: 11, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25984237

RESUMEN

BACKGROUND: To facilitate the implementation of the Family Smoking Prevention and Tobacco Control Act of 2009, the Federal Drug Agency (FDA) Center for Tobacco Products (CTP) has identified research priorities under the umbrella of tobacco regulatory science (TRS). As a newly integrated field, the current boundaries and landscape of TRS research are in need of definition. In this work, we conducted a bibliometric study of TRS research by applying author topic modeling (ATM) on MEDLINE citations published by currently-funded TRS principle investigators (PIs). RESULTS: We compared topics generated with ATM on dataset collected with TRS PIs and topics generated with ATM on dataset collected with a TRS keyword list. It is found that all those topics show a good alignment with FDA's funding protocols. More interestingly, we can see clear interactive relationships among PIs and between PIs and topics. Based on those interactions, we can discover how diverse each PI is, how productive they are, which topics are more popular and what main components each topic involves. Temporal trend analysis of key words shows the significant evaluation in four prime TRS areas. CONCLUSIONS: The results show that ATM can efficiently group articles into discriminative categories without any supervision. This indicates that we may incorporate ATM into author identification systems to infer the identity of an author of articles using topics generated by the model. It can also be useful to grantees and funding administrators in suggesting potential collaborators or identifying those that share common research interests for data harmonization or other purposes. The incorporation of temporal analysis can be employed to assess the change over time in TRS as new projects are funded and the extent to which new research reflects the funding priorities of the FDA.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...