RESUMO
BACKGROUND: Lyme disease is one of the most commonly reported infectious diseases in the United States (US), accounting for more than [Formula: see text] of all vector-borne diseases in North America. OBJECTIVE: In this paper, self-reported tweets on Twitter were analyzed in order to predict potential Lyme disease cases and accurately assess incidence rates in the US. METHODS: The study was done in three stages: (1) Approximately 1.3 million tweets were collected and pre-processed to extract the most relevant Lyme disease tweets with geolocations. A subset of tweets were semi-automatically labelled as relevant or irrelevant to Lyme disease using a set of precise keywords, and the remaining portion were manually labelled, yielding a curated labelled dataset of 77, 500 tweets. (2) This labelled data set was used to train, validate, and test various combinations of NLP word embedding methods and prominent ML classification models, such as TF-IDF and logistic regression, Word2vec and XGboost, and BERTweet, among others, to identify potential Lyme disease tweets. (3) Lastly, the presence of spatio-temporal patterns in the US over a 10-year period were studied. RESULTS: Preliminary results showed that BERTweet outperformed all tested NLP classifiers for identifying Lyme disease tweets, achieving the highest classification accuracy and F1-score of [Formula: see text]. There was also a consistent pattern indicating that the West and Northeast regions of the US had a higher tweet rate over time. CONCLUSIONS: We focused on the less-studied problem of using Twitter data as a surveillance tool for Lyme disease in the US. Several crucial findings have emerged from the study. First, there is a fairly strong correlation between classified tweet counts and Lyme disease counts, with both following similar trends. Second, in 2015 and early 2016, the social media network like Twitter was essential in raising popular awareness of Lyme disease. Third, counties with a high incidence rate were not necessarily related with a high tweet rate, and vice versa. Fourth, BERTweet can be used as a reliable NLP classifier for detecting relevant Lyme disease tweets.
Assuntos
Doença de Lyme , Mídias Sociais , Estados Unidos/epidemiologia , Humanos , Incidência , Aprendizado de Máquina , Autorrelato , Doença de Lyme/diagnóstico , Doença de Lyme/epidemiologiaRESUMO
BACKGROUND: Lyme disease is among the most reported tick-borne diseases worldwide, making it a major ongoing public health concern. An effective Lyme disease case reporting system depends on timely diagnosis and reporting by health care professionals, and accurate laboratory testing and interpretation for clinical diagnosis validation. A lack of these can lead to delayed diagnosis and treatment, which can exacerbate the severity of Lyme disease symptoms. Therefore, there is a need to improve the monitoring of Lyme disease by using other data sources, such as web-based data. OBJECTIVE: We analyzed global Twitter data to understand its potential and limitations as a tool for Lyme disease surveillance. We propose a transformer-based classification system to identify potential Lyme disease cases using self-reported tweets. METHODS: Our initial sample included 20,000 tweets collected worldwide from a database of over 1.3 million Lyme disease tweets. After preprocessing and geolocating tweets, tweets in a subset of the initial sample were manually labeled as potential Lyme disease cases or non-Lyme disease cases using carefully selected keywords. Emojis were converted to sentiment words, which were then replaced in the tweets. This labeled tweet set was used for the training, validation, and performance testing of DistilBERT (distilled version of BERT [Bidirectional Encoder Representations from Transformers]), ALBERT (A Lite BERT), and BERTweet (BERT for English Tweets) classifiers. RESULTS: The empirical results showed that BERTweet was the best classifier among all evaluated models (average F1-score of 89.3%, classification accuracy of 90.0%, and precision of 97.1%). However, for recall, term frequency-inverse document frequency and k-nearest neighbors performed better (93.2% and 82.6%, respectively). On using emojis to enrich the tweet embeddings, BERTweet had an increased recall (8% increase), DistilBERT had an increased F1-score of 93.8% (4% increase) and classification accuracy of 94.1% (4% increase), and ALBERT had an increased F1-score of 93.1% (5% increase) and classification accuracy of 93.9% (5% increase). The general awareness of Lyme disease was high in the United States, the United Kingdom, Australia, and Canada, with self-reported potential cases of Lyme disease from these countries accounting for around 50% (9939/20,000) of the collected English-language tweets, whereas Lyme disease-related tweets were rare in countries from Africa and Asia. The most reported Lyme disease-related symptoms in the data were rash, fatigue, fever, and arthritis, while symptoms, such as lymphadenopathy, palpitations, swollen lymph nodes, neck stiffness, and arrythmia, were uncommon, in accordance with Lyme disease symptom frequency. CONCLUSIONS: The study highlights the robustness of BERTweet and DistilBERT as classifiers for potential cases of Lyme disease from self-reported data. The results demonstrated that emojis are effective for enrichment, thereby improving the accuracy of tweet embeddings and the performance of classifiers. Specifically, emojis reflecting sadness, empathy, and encouragement can reduce false negatives.
Assuntos
Aprendizado Profundo , Doença de Lyme , Mídias Sociais , Humanos , Estados Unidos , Autorrelato , Doença de Lyme/diagnóstico , Doença de Lyme/epidemiologia , AtitudeRESUMO
Oligonucleotides containing phosphorothioate (PS) linkages have recently demonstrated significant clinical utility. PS oligonucleotides are manufactured via a solid-phase chain elongation process in which a four-reaction cycle consisting of detritylation, coupling, sulfurization, and failure sequence capping with Ac2O is repeated. In the capping step, uncoupled sequences are acetylated at the 5'-OH to stop the chain growth and control the levels of deletion, or ( n-1), impurities. Herein, we report that the byproducts of commonly used sulfurization reagents react with the 5'-OH and cap the failure sequences. The standard Ac2O capping step can therefore be eliminated, and this 3-reaction cycle process affords a higher yield and higher or comparable overall purity compared to the conventional 4-reaction synthesis. This improvement results in reducing the number of reactions from â¼80 to â¼60 for the synthesis of a typical length 20-mer oligonucleotide. For every kilogram of an oligonucleotide intermediate synthesized, > 500 L of reagents and organic solvents is saved, and the E-factor is decreased to <1500 from â¼2000.
Assuntos
Oligonucleotídeos Fosforotioatos/química , Oligonucleotídeos Fosforotioatos/síntese química , Enxofre/química , Sequência de Bases , Oligonucleotídeos Fosforotioatos/genética , Técnicas de Síntese em Fase SólidaRESUMO
Hydrophobic interaction chromatography (HIC) is commonly used to separate protein monomer and aggregate species in the purification of protein therapeutics. Despite being used frequently, the HIC separation mechanism is quite complex and not well understood. In this paper, we examined the separation of a monomer and aggregate protein mixture using Phenyl Sepharose FF. The mechanisms of protein adsorption, desorption, and diffusion of the two species were evaluated using several experimental approaches to determine which processes controlled the separation. A chromatography model, which used homogeneous diffusion (to describe mass transfer) and a competitive Langmuir binary isotherm (to describe protein adsorption and desorption), was formulated and used to predict the separation of the monomer and aggregate species. The experimental studies showed a fraction of the aggregate species bound irreversibly to the adsorbent, which was a major factor governing the separation of the species. The model predictions showed inclusion of irreversible binding in the adsorption mechanism greatly improved the model predictions over a range of operating conditions. The model successfully predicted the separation performance of the adsorbent with the examined feed.