ABSTRACT
The pointwise mutual information statistic (PMI), which measures how often two words occur together in a document corpus, is a cornerstone of recently proposed popular natural language processing algorithms such as word2vec. PMI and word2vec reveal semantic relationships between words and can be helpful in a range of applications such as document indexing, topic analysis, or document categorization. We use probability theory to demonstrate the relationship between PMI and word2vec. We use the theoretical results to demonstrate how the PMI can be modeled and estimated in a simple and straight forward manner. We further describe how one can obtain standard error estimates that account for within-patient clustering that arises from patterns of repeated words within a patient's health record due to a unique health history. We then demonstrate the usefulness of PMI on the problem of predictive identification of disease from free text notes of electronic health records. Specifically, we use our methods to distinguish those with and without type 2 diabetes mellitus in electronic health record free text data using over 400 000 clinical notes from an academic medical center.
Subject(s)
Diabetes Mellitus, Type 2 , Natural Language Processing , Algorithms , Electronic Health Records , HumansABSTRACT
We collected 2768 Influenza-like illness emergency public health incidents from April 1, 2005 to November 30, 2013reported in the Emergency Public Reporting System. After screening by strict inclusion and exclusion criteria, there were 613 outbreaks analyzed with susceptible-exposed-infectious/asymptomatic-removed model in order to estimate the proportion of asymptomatic individuals (p) and the effective reproduction number (Rt). The relation between Rt and viral subtypes, regions, outbreak sites, populations, and seasons were analyzed. The mean values of p of different subtypes ranged from 0.09 to 0.15, but could be as high as up to 0.94. Different subtypes, provinces, regions, and sites of outbreak had statistically significantly different Rt. In particular, the southern region also manifested different Rt by affected population size and seasonality. Our results provide China and also the rest of the world a reference to understand characteristics of transmission and develop prevention and control strategies.
Subject(s)
Influenza, Human/transmission , China/epidemiology , Disease Outbreaks , Humans , Influenza, Human/epidemiology , Influenza, Human/virology , Models, Theoretical , Orthomyxoviridae/classification , Orthomyxoviridae/isolation & purification , Population Surveillance , SeasonsABSTRACT
H6 avian influenza viruses (AIVs), which are prevalent in domestic and wild birds in Eurasian countries, have been isolated from pigs, a dog and a human. Routine virological surveillance at live poultry markets or poultry farms was conducted in southern China from 2009 to 2011. This study investigated the genetic and antigenic characteristics, analyzed the receptor-binding properties and evaluated the kinetics of infectivity of the AIVs in A549, MDCK and PK15 cells. A total of 14 H6N6 and 2 H6N2 isolates were obtained from four provinces in southern China. Genetic analysis indicated two distinct hemagglutinin lineages of the H6 strains cocirculating in southern China, and these strains facilitated active evolution and reassortment among multiple influenza virus subtypes from different avian species in nature. None of these isolates grouped with the novel Taiwan H6N1 virus responsible for human infection. Receptor-binding specificity assays showed that five H6 AIVs may have acquired the ability to recognize human receptors. Growth kinetics experiments showed that EV/HB-JZ/02/10(H6N2) and EV/JX/15/10(H6N6) initially reproduced faster and achieved higher titers than other viruses, suggesting that enhanced binding to α-2,6-linked sialic acids correlated with increased viral replication in mammalian cells. Overall, the results emphasize the need for continued surveillance of H6 outbreaks and extensive characterization of H6 isolates to better understand genetic changes and their implications.