Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Más filtros

Banco de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Bioinformatics ; 38(12): 3252-3258, 2022 06 13.
Artículo en Inglés | MEDLINE | ID: mdl-35441678

RESUMEN

MOTIVATION: As the number of public data resources continues to proliferate, identifying relevant datasets across heterogenous repositories is becoming critical to answering scientific questions. To help researchers navigate this data landscape, we developed Dug: a semantic search tool for biomedical datasets utilizing evidence-based relationships from curated knowledge graphs to find relevant datasets and explain why those results are returned. RESULTS: Developed through the National Heart, Lung and Blood Institute's (NHLBI) BioData Catalyst ecosystem, Dug has indexed more than 15 911 study variables from public datasets. On a manually curated search dataset, Dug's total recall (total relevant results/total results) of 0.79 outperformed default Elasticsearch's total recall of 0.76. When using synonyms or related concepts as search queries, Dug (0.36) far outperformed Elasticsearch (0.14) in terms of total recall with no significant loss in the precision of its top results. AVAILABILITY AND IMPLEMENTATION: Dug is freely available at https://github.com/helxplatform/dug. An example Dug deployment is also available for use at https://search.biodatacatalyst.renci.org/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Motor de Búsqueda , Semántica , Ecosistema , Indización y Redacción de Resúmenes
2.
Prev Med ; 177: 107783, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37980956

RESUMEN

BACKGROUND: Firearm violence represents a public health crisis in the United States. Yet, there is limited knowledge about how firearms are discussed in the context of mental health emergencies representing a major gap in the current research literature. This study addresses this gap by examining whether the content of mental health crisis text conversations that mention firearms differ from those that do not mention firearms in a large, unique dataset from a national crisis text line. METHODS: We examined data from over 3.2 million conversations between texters to Crisis Text Line and volunteer crisis counselors between September 2018 and July 2022. We used a study developed text classification machine learning algorithm that builds on natural language processing to identify and label whether crisis conversations mentioned firearms. We compared the frequency of psychosocial factors between conversations that mention firearms with those that did not. RESULTS: Results from a generalized linear mixed-effects model demonstrated that. conversations mentioning firearms more frequently were associated with suicide, racism, physical, sexual, emotional, and unspecified abuse, grief, concerns about a third party, substance use, bullying, gender and sexual identity, relationships, depression, and loneliness. Further, conversations mentioning firearms were less likely to be related to self-harm and eating/body image. CONCLUSIONS: These results offer an initial glimpse of how firearms are mentioned in the context of acute mental health emergencies, which has been completely absent in prior literature. Our results are preliminary and help sharpen our understanding of contextual factors surrounding mental health emergencies where a firearm is mentioned.


Asunto(s)
Armas de Fuego , Conducta Autodestructiva , Suicidio , Humanos , Estados Unidos , Salud Mental , Urgencias Médicas , Suicidio/psicología
3.
Sensors (Basel) ; 19(21)2019 Oct 23.
Artículo en Inglés | MEDLINE | ID: mdl-31652820

RESUMEN

Exposure assessment studies are the primary means for understanding links between exposure to chemical and physical agents and adverse health effects. Recently, researchers have proposed using wearable monitors during exposure assessment studies to obtain higher fidelity readings of exposures actually experienced by subjects. However, limited research has been conducted to link a wearer's actions to periods of exposure, a necessary step for estimating inhaled dosage. To aid researchers in these settings, we developed a machine learning model for identifying periods of bicycling activity using passively collected data from the RTI MicroPEM wearable exposure monitor, a lightweight device capable of continuously sampling both air pollution levels and accelerometry parameters. Our best performing model identifies biking activity with a mean leave-one-session-out (LOSO) cross-validation F1 score of 0.832 (unweighted) and 0.979 (weighted). Accelerometer derived features contributed greatly to the model performance, as well as temporal smoothing of the predicted activities. Additionally, we found competitive activity recognition can occur with even relatively low sampling rates, suggesting suitability for exposure assessment studies where continuous data collection for long periods (without recharge) are needed to capture realistic daily routines and exposures.


Asunto(s)
Deportes , Dispositivos Electrónicos Vestibles , Acelerometría , Contaminantes Atmosféricos/análisis , Monitoreo del Ambiente , Humanos , Aprendizaje Automático
4.
Int J Health Geogr ; 17(1): 12, 2018 05 09.
Artículo en Inglés | MEDLINE | ID: mdl-29743081

RESUMEN

BACKGROUND: Conducting surveys in low- and middle-income countries is often challenging because many areas lack a complete sampling frame, have outdated census information, or have limited data available for designing and selecting a representative sample. Geosampling is a probability-based, gridded population sampling method that addresses some of these issues by using geographic information system (GIS) tools to create logistically manageable area units for sampling. GIS grid cells are overlaid to partition a country's existing administrative boundaries into area units that vary in size from 50 m × 50 m to 150 m × 150 m. To avoid sending interviewers to unoccupied areas, researchers manually classify grid cells as "residential" or "nonresidential" through visual inspection of aerial images. "Nonresidential" units are then excluded from sampling and data collection. This process of manually classifying sampling units has drawbacks since it is labor intensive, prone to human error, and creates the need for simplifying assumptions during calculation of design-based sampling weights. In this paper, we discuss the development of a deep learning classification model to predict whether aerial images are residential or nonresidential, thus reducing manual labor and eliminating the need for simplifying assumptions. RESULTS: On our test sets, the model performs comparable to a human-level baseline in both Nigeria (94.5% accuracy) and Guatemala (96.4% accuracy), and outperforms baseline machine learning models trained on crowdsourced or remote-sensed geospatial features. Additionally, our findings suggest that this approach can work well in new areas with relatively modest amounts of training data. CONCLUSIONS: Gridded population sampling methods like geosampling are becoming increasingly popular in countries with outdated or inaccurate census data because of their timeliness, flexibility, and cost. Using deep learning models directly on satellite images, we provide a novel method for sample frame construction that identifies residential gridded aerial units. In cases where manual classification of satellite images is used to (1) correct for errors in gridded population data sets or (2) classify grids where population estimates are unavailable, this methodology can help reduce annotation burden with comparable quality to human analysts.


Asunto(s)
Demografía/clasificación , Países en Desarrollo/clasificación , Redes Neurales de la Computación , Características de la Residencia/clasificación , Imágenes Satelitales/clasificación , Recolección de Datos/clasificación , Recolección de Datos/métodos , Demografía/métodos , Guatemala/epidemiología , Humanos , Nigeria/epidemiología , Imágenes Satelitales/métodos
5.
J Med Internet Res ; 19(7): e236, 2017 07 04.
Artículo en Inglés | MEDLINE | ID: mdl-28676471

RESUMEN

BACKGROUND: Twitter represents a social media platform through which medical cannabis dispensaries can rapidly promote and advertise a multitude of retail products. Yet, to date, no studies have systematically evaluated Twitter behavior among dispensaries and how these behaviors influence the formation of social networks. OBJECTIVES: This study sought to characterize common cyberbehaviors and shared follower networks among dispensaries operating in two large cannabis markets in California. METHODS: From a targeted sample of 119 dispensaries in the San Francisco Bay Area and Greater Los Angeles, we collected metadata from the dispensary accounts using the Twitter API. For each city, we characterized the network structure of dispensaries based upon shared followers, then empirically derived communities with the Louvain modularity algorithm. Principal components factor analysis was employed to reduce 12 Twitter measures into a more parsimonious set of cyberbehavioral dimensions. Finally, quadratic discriminant analysis was implemented to verify the ability of the extracted dimensions to classify dispensaries into their derived communities. RESULTS: The modularity algorithm yielded three communities in each city with distinct network structures. The principal components factor analysis reduced the 12 cyberbehaviors into five dimensions that encompassed account age, posting frequency, referencing, hyperlinks, and user engagement among the dispensary accounts. In the quadratic discriminant analysis, the dimensions correctly classified 75% (46/61) of the communities in the San Francisco Bay Area and 71% (41/58) in Greater Los Angeles. CONCLUSIONS: The most centralized and strongly connected dispensaries in both cities had newer accounts, higher daily activity, more frequent user engagement, and increased usage of embedded media, keywords, and hyperlinks. Measures derived from both network structure and cyberbehavioral dimensions can serve as key contextual indicators for the online surveillance of cannabis dispensaries and consumer markets over time.


Asunto(s)
Cannabis/crecimiento & desarrollo , Internet/estadística & datos numéricos , Medios de Comunicación Sociales/estadística & datos numéricos , Red Social , California , Humanos
6.
Res Synth Methods ; 2024 Mar 03.
Artículo en Inglés | MEDLINE | ID: mdl-38432227

RESUMEN

Data extraction is a crucial, yet labor-intensive and error-prone part of evidence synthesis. To date, efforts to harness machine learning for enhancing efficiency of the data extraction process have fallen short of achieving sufficient accuracy and usability. With the release of large language models (LLMs), new possibilities have emerged to increase efficiency and accuracy of data extraction for evidence synthesis. The objective of this proof-of-concept study was to assess the performance of an LLM (Claude 2) in extracting data elements from published studies, compared with human data extraction as employed in systematic reviews. Our analysis utilized a convenience sample of 10 English-language, open-access publications of randomized controlled trials included in a single systematic review. We selected 16 distinct types of data, posing varying degrees of difficulty (160 data elements across 10 studies). We used the browser version of Claude 2 to upload the portable document format of each publication and then prompted the model for each data element. Across 160 data elements, Claude 2 demonstrated an overall accuracy of 96.3% with a high test-retest reliability (replication 1: 96.9%; replication 2: 95.0% accuracy). Overall, Claude 2 made 6 errors on 160 data items. The most common errors (n = 4) were missed data items. Importantly, Claude 2's ease of use was high; it required no technical expertise or labeled training data for effective operation (i.e., zero-shot learning). Based on findings of our proof-of-concept study, leveraging LLMs has the potential to substantially enhance the efficiency and accuracy of data extraction for evidence syntheses.

7.
Res Synth Methods ; 2024 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-38895747

RESUMEN

Accurate data extraction is a key component of evidence synthesis and critical to valid results. The advent of publicly available large language models (LLMs) has generated interest in these tools for evidence synthesis and created uncertainty about the choice of LLM. We compare the performance of two widely available LLMs (Claude 2 and GPT-4) for extracting pre-specified data elements from 10 published articles included in a previously completed systematic review. We use prompts and full study PDFs to compare the outputs from the browser versions of Claude 2 and GPT-4. GPT-4 required use of a third-party plugin to upload and parse PDFs. Accuracy was high for Claude 2 (96.3%). The accuracy of GPT-4 with the plug-in was lower (68.8%); however, most of the errors were due to the plug-in. Both LLMs correctly recognized when prespecified data elements were missing from the source PDF and generated correct information for data elements that were not reported explicitly in the articles. A secondary analysis demonstrated that, when provided selected text from the PDFs, Claude 2 and GPT-4 accurately extracted 98.7% and 100% of the data elements, respectively. Limitations include the narrow scope of the study PDFs used, that prompt development was completed using only Claude 2, and that we cannot guarantee the open-source articles were not used to train the LLMs. This study highlights the potential for LLMs to revolutionize data extraction but underscores the importance of accurate PDF parsing. For now, it remains essential for a human investigator to validate LLM extractions.

8.
medRxiv ; 2024 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-38343863

RESUMEN

Preventing and treating post-acute sequelae of SARS-CoV-2 infection (PASC), commonly known as Long COVID, has become a public health priority. In this study, we examined whether treatment with Paxlovid in the acute phase of COVID-19 helps prevent the onset of PASC. We used electronic health records from the National Covid Cohort Collaborative (N3C) to define a cohort of 426,352 patients who had COVID-19 since April 1, 2022, and were eligible for Paxlovid treatment due to risk for progression to severe COVID-19. We used the target trial emulation (TTE) framework to estimate the effect of Paxlovid treatment on PASC incidence. We estimated overall PASC incidence using a computable phenotype. We also measured the onset of novel cognitive, fatigue, and respiratory symptoms in the post-acute period. Paxlovid treatment did not have a significant effect on overall PASC incidence (relative risk [RR] = 0.98, 95% confidence interval [CI] 0.95-1.01). However, it had a protective effect on cognitive (RR = 0.90, 95% CI 0.84-0.96) and fatigue (RR = 0.95, 95% CI 0.91-0.98) symptom clusters, which suggests that the etiology of these symptoms may be more closely related to viral load than that of respiratory symptoms.

9.
medRxiv ; 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38947087

RESUMEN

Post-Acute Sequelae of SARS-CoV-2 infection (PASC), also known as Long-COVID, encompasses a variety of complex and varied outcomes following COVID-19 infection that are still poorly understood. We clustered over 600 million condition diagnoses from 14 million patients available through the National COVID Cohort Collaborative (N3C), generating hundreds of highly detailed clinical phenotypes. Assessing patient clinical trajectories using these clusters allowed us to identify individual conditions and phenotypes strongly increased after acute infection. We found many conditions increased in COVID-19 patients compared to controls, and using a novel method to associate patients with clusters over time, we additionally found phenotypes specific to patient sex, age, wave of infection, and PASC diagnosis status. While many of these results reflect known PASC symptoms, the resolution provided by this unprecedented data scale suggests avenues for improved diagnostics and mechanistic understanding of this multifaceted disease.

10.
JMIR Public Health Surveill ; 9: e42811, 2023 02 08.
Artículo en Inglés | MEDLINE | ID: mdl-36753321

RESUMEN

BACKGROUND: Mass shootings result in widespread psychological trauma for survivors and members of the affected community. However, less is known about the broader effects of indirect exposure (eg, media) to mass shootings. Crisis lines offer a unique opportunity to examine real-time data on the widespread psychological effects of mass shootings. OBJECTIVE: Crisis Text Line is a not-for-profit company that provides 24/7 confidential SMS text message-based mental health support and crisis intervention service. This study examines changes in the volume and composition of firearm-related conversations at Crisis Text Line before and after the mass school shooting at Robb Elementary School on May 24, 2022, in Uvalde, Texas. METHODS: A quasi-experimental event study design was used to compare the actual volume of firearm-related conversations received by Crisis Text Line post shooting to forecasted firearm conversation volume under the counterfactual scenario that a shooting had not occurred. Conversations related to firearms were identified among all conversations using keyword searches. Firearm conversation volume was predicted using a seasonal autoregressive integrated moving average model trained on the 3 months of data leading up to the shooting. Additionally, proportions of issue tags (topics coded post conversation by volunteer crisis counselors at Crisis Text Line after the exchange) were compared in the 4 days before (n=251) and after (n=417) the shooting to assess changes in conversation characteristics. The 4-day window was chosen to reflect the number of days conversation volume remained above forecasted levels. RESULTS: There was a significant increase in the number of conversations mentioning firearms following the shooting, with the largest spike (compared to forecasted numbers) occurring the day after the shooting (n=159) on May 25, 2022. By May 28, the volume reverted to within the 95% CI of the forecasted volume (n=77). Within firearm conversations, "grief" issue tags showed a significant increase in proportion in the week following the shooting, while "isolation/loneliness," "relationships," and "suicide" issue tags showed a significant decrease in proportions the week following the shooting. CONCLUSIONS: The results suggest that the Uvalde school shooting may have contributed to an increase in demand for crisis services, above what would be expected given historical trends. Additionally, we found that these firearm-related crises conversations immediately post event are more likely to be related to grief and less likely to be related to suicide, loneliness, and relationships. Our findings provide some of the first data showing the real-time repercussions for the broader population exposed to school shooting events. This work adds to a growing evidence base documenting and measuring the rippling effects of mass shootings outside of those directly impacted.


Asunto(s)
Armas de Fuego , Incidentes con Víctimas en Masa , Heridas por Arma de Fuego , Humanos , Heridas por Arma de Fuego/epidemiología , Texas/epidemiología , Instituciones Académicas
11.
AJPM Focus ; 2(1): 100045, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-37789939

RESUMEN

Introduction: This study analyzes age-differentiated Reddit conversations about ENDS. Methods: This study combines 2 methods to (1) predict Reddit users' age into 2 categories (13-20 years [underage] and 21-54 years [of legal age]) using a machine learning algorithm and (2) qualitatively code ENDS-related Reddit posts within the 2 groups. The 25 posts with the highest karma score (number of upvotes minus number of downvotes) for each keyword search (i.e., query) and each predicted age group were qualitatively coded. Results: Of 9, the top 3 topics that emerged were flavor restriction policies, Tobacco 21 policies, and use. Opposition to flavor restriction policies was a prominent subcategory for both groups but was more common in the 21-54 group. The 13-20 group was more likely to discuss opposition to minimum age laws as well as access to flavored ENDS products. The 21-54 group commonly mentioned general vaping use behavior. Conclusions: Users predicted to be in the underage group posted about different ENDS-related topics on Reddit than users predicted to be in the of-legal-age group.

12.
medRxiv ; 2023 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-37205340

RESUMEN

This study leverages electronic health record data in the National COVID Cohort Collaborative's (N3C) repository to investigate disparities in Paxlovid treatment and to emulate a target trial assessing its effectiveness in reducing COVID-19 hospitalization rates. From an eligible population of 632,822 COVID-19 patients seen at 33 clinical sites across the United States between December 23, 2021 and December 31, 2022, patients were matched across observed treatment groups, yielding an analytical sample of 410,642 patients. We estimate a 65% reduced odds of hospitalization among Paxlovid-treated patients within a 28-day follow-up period, and this effect did not vary by patient vaccination status. Notably, we observe disparities in Paxlovid treatment, with lower rates among Black and Hispanic or Latino patients, and within socially vulnerable communities. Ours is the largest study of Paxlovid's real-world effectiveness to date, and our primary findings are consistent with previous randomized control trials and real-world studies.

13.
Nat Commun ; 14(1): 2914, 2023 05 22.
Artículo en Inglés | MEDLINE | ID: mdl-37217471

RESUMEN

Long COVID, or complications arising from COVID-19 weeks after infection, has become a central concern for public health experts. The United States National Institutes of Health founded the RECOVER initiative to better understand long COVID. We used electronic health records available through the National COVID Cohort Collaborative to characterize the association between SARS-CoV-2 vaccination and long COVID diagnosis. Among patients with a COVID-19 infection between August 1, 2021 and January 31, 2022, we defined two cohorts using distinct definitions of long COVID-a clinical diagnosis (n = 47,404) or a previously described computational phenotype (n = 198,514)-to compare unvaccinated individuals to those with a complete vaccine series prior to infection. Evidence of long COVID was monitored through June or July of 2022, depending on patients' data availability. We found that vaccination was consistently associated with lower odds and rates of long COVID clinical diagnosis and high-confidence computationally derived diagnosis after adjusting for sex, demographics, and medical history.


Asunto(s)
COVID-19 , Síndrome Post Agudo de COVID-19 , Estados Unidos/epidemiología , Humanos , COVID-19/epidemiología , COVID-19/prevención & control , Vacunas contra la COVID-19 , Estudios de Cohortes , SARS-CoV-2 , Vacunación
14.
medRxiv ; 2022 Oct 07.
Artículo en Inglés | MEDLINE | ID: mdl-36238713

RESUMEN

Importance: Characterizing the effect of vaccination on long COVID allows for better healthcare recommendations. Objective: To determine if, and to what degree, vaccination prior to COVID-19 is associated with eventual long COVID onset, among those a documented COVID-19 infection. Design Settings and Participants: Retrospective cohort study of adults with evidence of COVID-19 between August 1, 2021 and January 31, 2022 based on electronic health records from eleven healthcare institutions taking part in the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, a project of the National Covid Cohort Collaborative (N3C). Exposures: Pre-COVID-19 receipt of a complete vaccine series versus no pre-COVID-19 vaccination. Main Outcomes and Measures: Two approaches to the identification of long COVID were used. In the clinical diagnosis cohort (n=47,752), ICD-10 diagnosis codes or evidence of a healthcare encounter at a long COVID clinic were used. In the model-based cohort (n=199,498), a computable phenotype was used. The association between pre-COVID vaccination and long COVID was estimated using IPTW-adjusted logistic regression and Cox proportional hazards. Results: In both cohorts, when adjusting for demographics and medical history, pre-COVID vaccination was associated with a reduced risk of long COVID (clinic-based cohort: HR, 0.66; 95% CI, 0.55-0.80; OR, 0.69; 95% CI, 0.59-0.82; model-based cohort: HR, 0.62; 95% CI, 0.56-0.69; OR, 0.70; 95% CI, 0.65-0.75). Conclusions and Relevance: Long COVID has become a central concern for public health experts. Prior studies have considered the effect of vaccination on the prevalence of future long COVID symptoms, but ours is the first to thoroughly characterize the association between vaccination and clinically diagnosed or computationally derived long COVID. Our results bolster the growing consensus that vaccines retain protective effects against long COVID even in breakthrough infections. Key Points: Question: Does vaccination prior to COVID-19 onset change the risk of long COVID diagnosis?Findings: Four observational analyses of EHRs showed a statistically significant reduction in long COVID risk associated with pre-COVID vaccination (first cohort: HR, 0.66; 95% CI, 0.55-0.80; OR, 0.69; 95% CI, 0.59-0.82; second cohort: HR, 0.62; 95% CI, 0.56-0.69; OR, 0.70; 95% CI, 0.65-0.75).Meaning: Vaccination prior to COVID onset has a protective association with long COVID even in the case of breakthrough infections.

15.
JMIR Public Health Surveill ; 7(3): e25807, 2021 03 16.
Artículo en Inglés | MEDLINE | ID: mdl-33724195

RESUMEN

BACKGROUND: Social media are important for monitoring perceptions of public health issues and for educating target audiences about health; however, limited information about the demographics of social media users makes it challenging to identify conversations among target audiences and limits how well social media can be used for public health surveillance and education outreach efforts. Certain social media platforms provide demographic information on followers of a user account, if given, but they are not always disclosed, and researchers have developed machine learning algorithms to predict social media users' demographic characteristics, mainly for Twitter. To date, there has been limited research on predicting the demographic characteristics of Reddit users. OBJECTIVE: We aimed to develop a machine learning algorithm that predicts the age segment of Reddit users, as either adolescents or adults, based on publicly available data. METHODS: This study was conducted between January and September 2020 using publicly available Reddit posts as input data. We manually labeled Reddit users' age by identifying and reviewing public posts in which Reddit users self-reported their age. We then collected sample posts, comments, and metadata for the labeled user accounts and created variables to capture linguistic patterns, posting behavior, and account details that would distinguish the adolescent age group (aged 13 to 20 years) from the adult age group (aged 21 to 54 years). We split the data into training (n=1660) and test sets (n=415) and performed 5-fold cross validation on the training set to select hyperparameters and perform feature selection. We ran multiple classification algorithms and tested the performance of the models (precision, recall, F1 score) in predicting the age segments of the users in the labeled data. To evaluate associations between each feature and the outcome, we calculated means and confidence intervals and compared the two age groups, with 2-sample t tests, for each transformed model feature. RESULTS: The gradient boosted trees classifier performed the best, with an F1 score of 0.78. The test set precision and recall scores were 0.79 and 0.89, respectively, for the adolescent group (n=254) and 0.78 and 0.63, respectively, for the adult group (n=161). The most important feature in the model was the number of sentences per comment (permutation score: mean 0.100, SD 0.004). Members of the adolescent age group tended to have created accounts more recently, have higher proportions of submissions and comments in the r/teenagers subreddit, and post more in subreddits with higher subscriber counts than those in the adult group. CONCLUSIONS: We created a Reddit age prediction algorithm with competitive accuracy using publicly available data, suggesting machine learning methods can help public health agencies identify age-related target audiences on Reddit. Our results also suggest that there are characteristics of Reddit users' posting behavior, linguistic patterns, and account features that distinguish adolescents from adults.


Asunto(s)
Algoritmos , Aprendizaje Automático , Metadatos , Medios de Comunicación Sociales/estadística & datos numéricos , Adolescente , Adulto , Factores de Edad , Humanos , Persona de Mediana Edad , Modelos Psicológicos , Reproducibilidad de los Resultados , Adulto Joven
17.
Vital Health Stat 1 ; (189): 1-29, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34662269

RESUMEN

Objectives Medical coding, or the translation of healthcare information into numeric codes, is expensive and time intensive. This exploratory study evaluates the use of machine learning classifiers to perform automated medical coding for large statistical healthcare surveys.


Asunto(s)
Codificación Clínica , Aprendizaje Automático , Atención a la Salud , Encuestas de Atención de la Salud , Traducciones
18.
EGEMS (Wash DC) ; 7(1): 40, 2019 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-31406697

RESUMEN

The results of many large-scale federal or multi-site evaluations are typically compiled into long reports which end up sitting on policymaker's shelves. Moreover, the information policymakers need from these reports is often buried in the report, may not be remembered, understood, or readily accessible to the policymaker when it is needed. This is not a new challenge for evaluators, and advances in statistical methodology, while they have created greater opportunities for insight, may compound the challenge by creating multiple lenses through which evidence can be viewed. The descriptive evidence from traditional frequentist models, while familiar, are frequently misunderstood, while newer Bayesian methods provide evidence which is intuitive, but less familiar. These methods are complementary but presenting both increases the amount of evidence stakeholders and policymakers may find useful. In response to these challenges, we developed an interactive dashboard that synthesizes quantitative and qualitative data and allows users to access the evidence they want, when they want it, allowing each user a customized, and customizable view into the data collected for one large-scale federal evaluation. This offers the opportunity for policymakers to select the specifics that are most relevant to them at any moment, and also apply their own risk tolerance to the probabilities of various outcomes.

19.
PLoS One ; 12(8): e0183537, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28850620

RESUMEN

Health organizations are increasingly using social media, such as Twitter, to disseminate health messages to target audiences. Determining the extent to which the target audience (e.g., age groups) was reached is critical to evaluating the impact of social media education campaigns. The main objective of this study was to examine the separate and joint predictive validity of linguistic and metadata features in predicting the age of Twitter users. We created a labeled dataset of Twitter users across different age groups (youth, young adults, adults) by collecting publicly available birthday announcement tweets using the Twitter Search application programming interface. We manually reviewed results and, for each age-labeled handle, collected the 200 most recent publicly available tweets and user handles' metadata. The labeled data were split into training and test datasets. We created separate models to examine the predictive validity of language features only, metadata features only, language and metadata features, and words/phrases from another age-validated dataset. We estimated accuracy, precision, recall, and F1 metrics for each model. An L1-regularized logistic regression model was conducted for each age group, and predicted probabilities between the training and test sets were compared for each age group. Cohen's d effect sizes were calculated to examine the relative importance of significant features. Models containing both Tweet language features and metadata features performed the best (74% precision, 74% recall, 74% F1) while the model containing only Twitter metadata features were least accurate (58% precision, 60% recall, and 57% F1 score). Top predictive features included use of terms such as "school" for youth and "college" for young adults. Overall, it was more challenging to predict older adults accurately. These results suggest that examining linguistic and Twitter metadata features to predict youth and young adult Twitter users may be helpful for informing public health surveillance and evaluation research.


Asunto(s)
Juicio , Lenguaje , Metadatos , Medios de Comunicación Sociales , Adolescente , Adulto , Factores de Edad , Recolección de Datos , Humanos , Modelos Teóricos , Adulto Joven
20.
JMIR Public Health Surveill ; 3(3): e63, 2017 Sep 26.
Artículo en Inglés | MEDLINE | ID: mdl-28951381

RESUMEN

BACKGROUND: Despite concerns about their health risks, e­cigarettes have gained popularity in recent years. Concurrent with the recent increase in e­cigarette use, social media sites such as Twitter have become a common platform for sharing information about e-cigarettes and to promote marketing of e­cigarettes. Monitoring the trends in e­cigarette-related social media activity requires timely assessment of the content of posts and the types of users generating the content. However, little is known about the diversity of the types of users responsible for generating e­cigarette-related content on Twitter. OBJECTIVE: The aim of this study was to demonstrate a novel methodology for automatically classifying Twitter users who tweet about e­cigarette-related topics into distinct categories. METHODS: We collected approximately 11.5 million e­cigarette-related tweets posted between November 2014 and October 2016 and obtained a random sample of Twitter users who tweeted about e­cigarettes. Trained human coders examined the handles' profiles and manually categorized each as one of the following user types: individual (n=2168), vaper enthusiast (n=334), informed agency (n=622), marketer (n=752), and spammer (n=1021). Next, the Twitter metadata as well as a sample of tweets for each labeled user were gathered, and features that reflect users' metadata and tweeting behavior were analyzed. Finally, multiple machine learning algorithms were tested to identify a model with the best performance in classifying user types. RESULTS: Using a classification model that included metadata and features associated with tweeting behavior, we were able to predict with relatively high accuracy five different types of Twitter users that tweet about e­cigarettes (average F1 score=83.3%). Accuracy varied by user type, with F1 scores of individuals, informed agencies, marketers, spammers, and vaper enthusiasts being 91.1%, 84.4%, 81.2%, 79.5%, and 47.1%, respectively. Vaper enthusiasts were the most challenging user type to predict accurately and were commonly misclassified as marketers. The inclusion of additional tweet-derived features that capture tweeting behavior was found to significantly improve the model performance-an overall F1 score gain of 10.6%-beyond metadata features alone. CONCLUSIONS: This study provides a method for classifying five different types of users who tweet about e­cigarettes. Our model achieved high levels of classification performance for most groups, and examining the tweeting behavior was critical in improving the model performance. Results can help identify groups engaged in conversations about e­cigarettes online to help inform public health surveillance, education, and regulatory efforts.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA