Pesquisa | Biblioteca Virtual em Saúde

1.

Robustness evaluations of pathway activity inference methods on gene expression data.

Hui, Tay Xin; Kasim, Shahreen; Aziz, Izzatdin Abdul; Fudzee, Mohd Farhan Md; Haron, Nazleeni Samiha; Sutikno, Tole; Hassan, Rohayanti; Mahdin, Hairulnizam; Sen, Seah Choon.

BMC Bioinformatics ; 25(1): 23, 2024 Jan 12.

Artigo em Inglês | MEDLINE | ID: mdl-38216898

RESUMO

BACKGROUND: With the exponential growth of high-throughput technologies, multiple pathway analysis methods have been proposed to estimate pathway activities from gene expression profiles. These pathway activity inference methods can be divided into two main categories: non-Topology-Based (non-TB) and Pathway Topology-Based (PTB) methods. Although some review and survey articles discussed the topic from different aspects, there is a lack of systematic assessment and comparisons on the robustness of these approaches. RESULTS: Thus, this study presents comprehensive robustness evaluations of seven widely used pathway activity inference methods using six cancer datasets based on two assessments. The first assessment seeks to investigate the robustness of pathway activity in pathway activity inference methods, while the second assessment aims to assess the robustness of risk-active pathways and genes predicted by these methods. The mean reproducibility power and total number of identified informative pathways and genes were evaluated. Based on the first assessment, the mean reproducibility power of pathway activity inference methods generally decreased as the number of pathway selections increased. Entropy-based Directed Random Walk (e-DRW) distinctly outperformed other methods in exhibiting the greatest reproducibility power across all cancer datasets. On the other hand, the second assessment shows that no methods provide satisfactory results across datasets. CONCLUSION: However, PTB methods generally appear to perform better in producing greater reproducibility power and identifying potential cancer markers compared to non-TB methods.

Assuntos

Neoplasias , Humanos , Reprodutibilidade dos Testes , Neoplasias/genética , Entropia , Expressão Gênica

2.

No buzz for bees: Media coverage of pollinator decline.

Althaus, Scott L; Berenbaum, May R; Jordan, Jenna; Shalmon, Dan A.

Proc Natl Acad Sci U S A ; 118(2)2021 01 12.

Artigo em Inglês | MEDLINE | ID: mdl-33431567

RESUMO

Although widespread declines in insect biomass and diversity are increasing concerns within the scientific community, it remains unclear whether attention to pollinator declines has also increased within information sources serving the general public. Examining patterns of journalistic attention to the pollinator population crisis can also inform efforts to raise awareness about the importance of declines of insect species providing ecosystem services beyond pollination. We used the Global News Index developed by the Cline Center for Advanced Social Research at the University of Illinois at Urbana-Champaign to track news attention to pollinator topics in nearly 25 million news items published by two American national newspapers and four international wire services over the past four decades. We found vanishingly low levels of attention to pollinator population topics relative to coverage of climate change, which we use as a comparison topic. In the most recent subset of â¼10 million stories published from 2007 to 2019, 1.39% (137,086 stories) refer to climate change/global warming while only 0.02% (1,780) refer to pollinator populations in all contexts, and just 0.007% (679) refer to pollinator declines. Substantial increases in news attention were detectable only in US national newspapers. We also find that, while climate change stories appear primarily in newspaper "front sections," pollinator population stories remain largely marginalized in "science" and "back section" reports. At the same time, news reports about pollinator populations increasingly link the issue to climate change, which might ultimately help raise public awareness to effect needed policy changes.

Assuntos

Biodiversidade , Extinção Biológica , Insetos , Meios de Comunicação de Massa/tendências , Polinização , Animais , Mudança Climática , Disseminação de Informação , Meios de Comunicação de Massa/estatística & dados numéricos

3.

Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study.

Shin, Daun; Kim, Hyoseung; Lee, Seunghwan; Cho, Younhee; Jung, Whanbo.

J Med Internet Res ; 26: e54617, 2024 Sep 18.

Artigo em Inglês | MEDLINE | ID: mdl-39292502

RESUMO

BACKGROUND: Depressive disorders have substantial global implications, leading to various social consequences, including decreased occupational productivity and a high disability burden. Early detection and intervention for clinically significant depression have gained attention; however, the existing depression screening tools, such as the Center for Epidemiologic Studies Depression Scale, have limitations in objectivity and accuracy. Therefore, researchers are identifying objective indicators of depression, including image analysis, blood biomarkers, and ecological momentary assessments (EMAs). Among EMAs, user-generated text data, particularly from diary writing, have emerged as a clinically significant and analyzable source for detecting or diagnosing depression, leveraging advancements in large language models such as ChatGPT. OBJECTIVE: We aimed to detect depression based on user-generated diary text through an emotional diary writing app using a large language model (LLM). We aimed to validate the value of the semistructured diary text data as an EMA data source. METHODS: Participants were assessed for depression using the Patient Health Questionnaire and suicide risk was evaluated using the Beck Scale for Suicide Ideation before starting and after completing the 2-week diary writing period. The text data from the daily diaries were also used in the analysis. The performance of leading LLMs, such as ChatGPT with GPT-3.5 and GPT-4, was assessed with and without GPT-3.5 fine-tuning on the training data set. The model performance comparison involved the use of chain-of-thought and zero-shot prompting to analyze the text structure and content. RESULTS: We used 428 diaries from 91 participants; GPT-3.5 fine-tuning demonstrated superior performance in depression detection, achieving an accuracy of 0.902 and a specificity of 0.955. However, the balanced accuracy was the highest (0.844) for GPT-3.5 without fine-tuning and prompt techniques; it displayed a recall of 0.929. CONCLUSIONS: Both GPT-3.5 and GPT-4.0 demonstrated relatively reasonable performance in recognizing the risk of depression based on diaries. Our findings highlight the potential clinical usefulness of user-generated text data for detecting depression. In addition to measurable indicators, such as step count and physical activity, future research should increasingly emphasize qualitative digital expression.

Assuntos

Depressão , Humanos , Feminino , Masculino , Adulto , Depressão/diagnóstico , Saúde Mental , Programas de Rastreamento/métodos , Adulto Jovem , Aplicativos Móveis , Diários como Assunto , Idioma , Pessoa de Meia-Idade

4.

Quality indices for topic model selection and evaluation: a literature review and case study.

Meaney, Christopher; Stukel, Therese A; Austin, Peter C; Moineddin, Rahim; Greiver, Michelle; Escobar, Michael.

BMC Med Inform Decis Mak ; 23(1): 132, 2023 07 22.

Artigo em Inglês | MEDLINE | ID: mdl-37481523

RESUMO

BACKGROUND: Topic models are a class of unsupervised machine learning models, which facilitate summarization, browsing and retrieval from large unstructured document collections. This study reviews several methods for assessing the quality of unsupervised topic models estimated using non-negative matrix factorization. Techniques for topic model validation have been developed across disparate fields. We synthesize this literature, discuss the advantages and disadvantages of different techniques for topic model validation, and illustrate their usefulness for guiding model selection on a large clinical text corpus. DESIGN, SETTING AND DATA: Using a retrospective cohort design, we curated a text corpus containing 382,666 clinical notes collected between 01/01/2017 through 12/31/2020 from primary care electronic medical records in Toronto Canada. METHODS: Several topic model quality metrics have been proposed to assess different aspects of model fit. We explored the following metrics: reconstruction error, topic coherence, rank biased overlap, Kendall's weighted tau, partition coefficient, partition entropy and the Xie-Beni statistic. Depending on context, cross-validation and/or bootstrap stability analysis were used to estimate these metrics on our corpus. RESULTS: Cross-validated reconstruction error favored large topic models (K ≥ 100 topics) on our corpus. Stability analysis using topic coherence and the Xie-Beni statistic also favored large models (K = 100 topics). Rank biased overlap and Kendall's weighted tau favored small models (K = 5 topics). Few model evaluation metrics suggested mid-sized topic models (25 ≤ K ≤ 75) as being optimal. However, human judgement suggested that mid-sized topic models produced expressive low-dimensional summarizations of the corpus. CONCLUSIONS: Topic model quality indices are transparent quantitative tools for guiding model selection and evaluation. Our empirical illustration demonstrated that different topic model quality indices favor models of different complexity; and may not select models aligning with human judgment. This suggests that different metrics capture different aspects of model goodness of fit. A combination of topic model quality indices, coupled with human validation, may be useful in appraising unsupervised topic models.

Assuntos

Algoritmos , Benchmarking , Humanos , Estudos Retrospectivos , Canadá , Registros Eletrônicos de Saúde

5.

Collective self-understanding: A linguistic style analysis of naturally occurring text data.

Cork, Alicia; Everson, Richard; Naserian, Elahe; Levine, Mark; Koschate-Reis, Miriam.

Behav Res Methods ; 55(8): 4455-4477, 2023 12.

Artigo em Inglês | MEDLINE | ID: mdl-36443583

RESUMO

Understanding what groups stand for is integral to a diverse array of social processes, ranging from understanding political conflicts to organisational behaviour to promoting public health behaviours. Traditionally, researchers rely on self-report methods such as interviews and surveys to assess groups' collective self-understandings. Here, we demonstrate the value of using naturally occurring online textual data to map the similarities and differences between real-world groups' collective self-understandings. We use machine learning algorithms to assess similarities between 15 diverse online groups' linguistic style, and then use multidimensional scaling to map the groups in two-dimensonal space (N=1,779,098 Reddit comments). We then use agglomerative and k-means clustering techniques to assess how the 15 groups cluster, finding there are four behaviourally distinct group types - vocational, collective action (comprising political and ethnic/religious identities), relational and stigmatised groups, with stigmatised groups having a less distinctive behavioural profile than the other group types. Study 2 is a secondary data analysis where we find strong relationships between the coordinates of each group in multidimensional space and the groups' values. In Study 3, we demonstrate how this approach can be used to track the development of groups' collective self-understandings over time. Using transgender Reddit data (N= 1,095,620 comments) as a proof-of-concept, we track the gradual politicisation of the transgender group over the past decade. The automaticity of this methodology renders it advantageous for monitoring multiple online groups simultaneously. This approach has implications for both governmental agencies and social researchers more generally. Future research avenues and applications are discussed.

Assuntos

Linguística , Humanos , Aprendizado de Máquina , Mídias Sociais

6.

Facilitators and barriers to compliance with COVID-19 guidelines: a structural topic modelling analysis of free-text data from 17,500 UK adults.

Wright, Liam; Paul, Elise; Steptoe, Andrew; Fancourt, Daisy.

BMC Public Health ; 22(1): 34, 2022 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-34991545

RESUMO

BACKGROUND: During the COVID-19 pandemic, the UK government implemented a series of guidelines, rules, and restrictions to change citizens' behaviour to tackle the spread of the virus, such as the promotion of face masks and the imposition of lockdown stay-at-home orders. The success of such measures requires active co-operation on the part of citizens, but compliance was not complete. Detailed research is required on the factors that aided or hindered compliance with these measures. METHODS: To understand the facilitators and barriers to compliance with COVID-19 guidelines, we used structural topic modelling, a text mining technique, to extract themes from over 26,000 free-text survey responses from 17,500 UK adults, collected between 17 November and 23 December 2020. RESULTS: The main factors facilitating compliance were desires to reduce risk to oneself and one's family and friends and to, a lesser extent, the general public. Also of importance were a desire to return to normality, the availability of activities and technological means to contact family and friends, and the ability to work from home. Identified barriers were difficulties maintaining social distancing in public (due to the actions of other people or environmental constraints), the need to provide or receive support from family and friends, social isolation, missing loved ones, and mental health impacts, perceiving the risks as low, social pressure to not comply, and difficulties understanding and keep abreast of changing rules. Several of the barriers and facilitators raised were related to participant characteristics. Notably, women were more likely to discuss needing to provide or receive mental health support from friends and family. CONCLUSION: The results demonstrated an array of factors contributed to compliance with guidelines. Of particular policy importance, the results suggest that government communication that emphasizes the potential risks of the virus and provides simple, consistent guidance on how to reduce the spread of the virus would improve compliance with preventive behaviours as COVID-19 continues and for future pandemics.

Assuntos

COVID-19 , Adulto , Controle de Doenças Transmissíveis , Feminino , Humanos , Pandemias , SARS-CoV-2 , Reino Unido

7.

Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts.

Murakami, Riki; Chakraborty, Basabi.

Sensors (Basel) ; 22(3)2022 Jan 23.

Artigo em Inglês | MEDLINE | ID: mdl-35161598

RESUMO

With the rapid proliferation of social networking sites (SNS), automatic topic extraction from various text messages posted on SNS are becoming an important source of information for understanding current social trends or needs. Latent Dirichlet Allocation (LDA), a probabilistic generative model, is one of the popular topic models in the area of Natural Language Processing (NLP) and has been widely used in information retrieval, topic extraction, and document analysis. Unlike long texts from formal documents, messages on SNS are generally short. Traditional topic models such as LDA or pLSA (probabilistic latent semantic analysis) suffer performance degradation for short-text analysis due to a lack of word co-occurrence information in each short text. To cope with this problem, various techniques are evolving for interpretable topic modeling for short texts, pretrained word embedding with an external corpus combined with topic models is one of them. Due to recent developments of deep neural networks (DNN) and deep generative models, neural-topic models (NTM) are emerging to achieve flexibility and high performance in topic modeling. However, there are very few research works on neural-topic models with pretrained word embedding for generating high-quality topics from short texts. In this work, in addition to pretrained word embedding, a fine-tuning stage with an original corpus is proposed for training neural-topic models in order to generate semantically coherent, corpus-specific topics. An extensive study with eight neural-topic models has been completed to check the effectiveness of additional fine-tuning and pretrained word embedding in generating interpretable topics by simulation experiments with several benchmark datasets. The extracted topics are evaluated by different metrics of topic coherence and topic diversity. We have also studied the performance of the models in classification and clustering tasks. Our study concludes that though auxiliary word embedding with a large external corpus improves the topic coherency of short texts, an additional fine-tuning stage is needed for generating more corpus-specific topics from short-text data.

Assuntos

Envio de Mensagens de Texto , Análise por Conglomerados , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Redes Neurais de Computação

8.

Machine Learning and Lexicon Approach to Texts Processing in the Detection of Degrees of Toxicity in Online Discussions.

Machová, Kristína; Mach, Marián; Adamisín, Kamil.

Sensors (Basel) ; 22(17)2022 Aug 27.

Artigo em Inglês | MEDLINE | ID: mdl-36080927

RESUMO

This article focuses on the problem of detecting toxicity in online discussions. Toxicity is currently a serious problem when people are largely influenced by opinions on social networks. We offer a solution based on classification models using machine learning methods to classify short texts on social networks into multiple degrees of toxicity. The classification models used both classic methods of machine learning, such as naïve Bayes and SVM (support vector machine) as well ensemble methods, such as bagging and RF (random forest). The models were created using text data, which we extracted from social networks in the Slovak language. The labelling of our dataset of short texts into multiple classes-the degrees of toxicity-was provided automatically by our method based on the lexicon approach to texts processing. This lexicon method required creating a dictionary of toxic words in the Slovak language, which is another contribution of the work. Finally, an application was created based on the learned machine learning models, which can be used to detect the degree of toxicity of new social network comments as well as for experimentation with various machine learning methods. We achieved the best results using an SVM-average value of accuracy = 0.89 and F1 = 0.79. This model also outperformed the ensemble learning by the RF and Bagging methods; however, the ensemble learning methods achieved better results than the naïve Bayes method.

Assuntos

Aprendizado de Máquina , Máquina de Vetores de Suporte , Teorema de Bayes , Humanos

9.

Deep Learning in the Detection of Disinformation about COVID-19 in Online Space.

Machová, Kristína; Mach, Marián; Porezaný, Michal.

Sensors (Basel) ; 22(23)2022 Nov 30.

Artigo em Inglês | MEDLINE | ID: mdl-36502024

RESUMO

This article focuses on the problem of detecting disinformation about COVID-19 in online discussions. As the Internet expands, so does the amount of content on it. In addition to content based on facts, a large amount of content is being manipulated, which negatively affects the whole society. This effect is currently compounded by the ongoing COVID-19 pandemic, which caused people to spend even more time online and to get more invested in this fake content. This work brings a brief overview of how toxic information looks like, how it is spread, and how to potentially prevent its dissemination by early recognition of disinformation using deep learning. We investigated the overall suitability of deep learning in solving problem of detection of disinformation in conversational content. We also provided a comparison of architecture based on convolutional and recurrent principles. We have trained three detection models based on three architectures using CNN (convolutional neural networks), LSTM (long short-term memory), and their combination. We have achieved the best results using LSTM (F1 = 0.8741, Accuracy = 0.8628). But the results of all three architectures were comparable, for example the CNN+LSTM architecture achieved F1 = 0.8672 and Accuracy = 0.852. The paper offers finding that introducing a convolutional component does not bring significant improvement. In comparison with our previous works, we noted that from all forms of antisocial posts, disinformation is the most difficult to recognize, since disinformation has no unique language, such as hate speech, toxic posts etc.

Assuntos

COVID-19 , Aprendizado Profundo , Humanos , COVID-19/diagnóstico , Pandemias , Redes Neurais de Computação , Idioma

10.

Comparison of Machine Learning and Sentiment Analysis in Detection of Suspicious Online Reviewers on Different Type of Data.

Machova, Kristina; Mach, Marian; Vasilko, Matej.

Sensors (Basel) ; 22(1)2021 Dec 27.

Artigo em Inglês | MEDLINE | ID: mdl-35009698

RESUMO

The article focuses on solving an important problem of detecting suspicious reviewers in online discussions on social networks. We have concentrated on a special type of suspicious authors, on trolls. We have used methods of machine learning for generation of detection models to discriminate a troll reviewer from a common reviewer, but also methods of sentiment analysis to recognize the sentiment typical for troll's comments. The sentiment analysis can be provided also using machine learning or lexicon-based approach. We have used lexicon-based sentiment analysis for its better ability to detect a dictionary typical for troll authors. We have achieved Accuracy = 0.95 and F1 = 0.80 using sentiment analysis. The best results using machine learning methods were achieved by support vector machine, Accuracy = 0.986 and F1 = 0.988, using a dataset with the set of all selected attributes. We can conclude that detection model based on machine learning is more successful than lexicon-based sentiment analysis, but the difference in accuracy is not so large as in F1 measure.

Assuntos

Aprendizado de Máquina , Análise de Sentimentos , Atitude , Máquina de Vetores de Suporte

11.

Data analytics of social media publicity to enhance household waste management.

Jiang, Peng; Fan, Yee Van; Klemes, Jirí Jaromír.

Resour Conserv Recycl ; 164: 105146, 2021 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-32905054

RESUMO

Household waste segregation and recycling is ranked at a high priority of the waste management hierarchy. Its management remains a great challenge due to the high dependency on social behaviours. The integration of Internet of Things (IoT) and subscription accounts on social media platforms related to household waste management could be an effective and environmentally friendly publicity approach than traditional publicity via posters and newspapers. However, there is a paucity of literature on measuring social media publicity in household waste management, which brings challenges for practitioners to characterise and improve this publicity pathway. In this study, under an integrated framework, data mining approaches are employed or extended for multidimensional publicity analytics using the data of online footprints of propagandist and users. A real-world case study based on a subscription account on the WeChat platform, Shanghai Green Account, is analysed to reveal useful insights for personalised improvements of household waste management. This study suggests that the current publicity related to household waste management leans towards propagandist-centred in both timing and topic dimensions. The identified timing, which has high user engagement, is 12:00-13:00 and 21:00-22:00 on Thursday. The overall relative publicity quality of historical posts is calculated as 0.95. Average user engagement under the macro policy in Shanghai was elevated by 138.5% from 2018 to 2019, during which the collections of biodegradable food waste and recyclable waste were elevated by 88.8% and 431.8%. Intelligent decision support by publicity analytics could enhance household waste management through effective communication.

12.

Toward the Development of Data Governance Standards for Using Clinical Free-Text Data in Health Research: Position Paper.

Jones, Kerina H; Ford, Elizabeth M; Lea, Nathan; Griffiths, Lucy J; Hassan, Lamiece; Heys, Sharon; Squires, Emma; Nenadic, Goran.

J Med Internet Res ; 22(6): e16760, 2020 06 29.

Artigo em Inglês | MEDLINE | ID: mdl-32597785

RESUMO

BACKGROUND: Clinical free-text data (eg, outpatient letters or nursing notes) represent a vast, untapped source of rich information that, if more accessible for research, would clarify and supplement information coded in structured data fields. Data usually need to be deidentified or anonymized before they can be reused for research, but there is a lack of established guidelines to govern effective deidentification and use of free-text information and avoid damaging data utility as a by-product. OBJECTIVE: This study aimed to develop recommendations for the creation of data governance standards to integrate with existing frameworks for personal data use, to enable free-text data to be used safely for research for patient and public benefit. METHODS: We outlined data protection legislation and regulations relating to the United Kingdom for context and conducted a rapid literature review and UK-based case studies to explore data governance models used in working with free-text data. We also engaged with stakeholders, including text-mining researchers and the general public, to explore perceived barriers and solutions in working with clinical free-text. RESULTS: We proposed a set of recommendations, including the need for authoritative guidance on data governance for the reuse of free-text data, to ensure public transparency in data flows and uses, to treat deidentified free-text data as potentially identifiable with use limited to accredited data safe havens, and to commit to a culture of continuous improvement to understand the relationships between the efficacy of deidentification and reidentification risks, so this can be communicated to all stakeholders. CONCLUSIONS: By drawing together the findings of a combination of activities, we present a position paper to contribute to the development of data governance standards for the reuse of clinical free-text data for secondary purposes. While working in accordance with existing data governance frameworks, there is a need for further work to take forward the recommendations we have proposed, with commitment and investment, to assure and expand the safe reuse of clinical free-text data for public benefit.

Assuntos

Análise de Dados , Humanos , Padrões de Referência , Envio de Mensagens de Texto

13.

Augmenting Qualitative Text Analysis with Natural Language Processing: Methodological Study.

Guetterman, Timothy C; Chang, Tammy; DeJonckheere, Melissa; Basu, Tanmay; Scruggs, Elizabeth; Vydiswaran, V G Vinod.

J Med Internet Res ; 20(6): e231, 2018 06 29.

Artigo em Inglês | MEDLINE | ID: mdl-29959110

RESUMO

BACKGROUND: Qualitative research methods are increasingly being used across disciplines because of their ability to help investigators understand the perspectives of participants in their own words. However, qualitative analysis is a laborious and resource-intensive process. To achieve depth, researchers are limited to smaller sample sizes when analyzing text data. One potential method to address this concern is natural language processing (NLP). Qualitative text analysis involves researchers reading data, assigning code labels, and iteratively developing findings; NLP has the potential to automate part of this process. Unfortunately, little methodological research has been done to compare automatic coding using NLP techniques and qualitative coding, which is critical to establish the viability of NLP as a useful, rigorous analysis procedure. OBJECTIVE: The purpose of this study was to compare the utility of a traditional qualitative text analysis, an NLP analysis, and an augmented approach that combines qualitative and NLP methods. METHODS: We conducted a 2-arm cross-over experiment to compare qualitative and NLP approaches to analyze data generated through 2 text (short message service) message survey questions, one about prescription drugs and the other about police interactions, sent to youth aged 14-24 years. We randomly assigned a question to each of the 2 experienced qualitative analysis teams for independent coding and analysis before receiving NLP results. A third team separately conducted NLP analysis of the same 2 questions. We examined the results of our analyses to compare (1) the similarity of findings derived, (2) the quality of inferences generated, and (3) the time spent in analysis. RESULTS: The qualitative-only analysis for the drug question (n=58) yielded 4 major findings, whereas the NLP analysis yielded 3 findings that missed contextual elements. The qualitative and NLP-augmented analysis was the most comprehensive. For the police question (n=68), the qualitative-only analysis yielded 4 primary findings and the NLP-only analysis yielded 4 slightly different findings. Again, the augmented qualitative and NLP analysis was the most comprehensive and produced the highest quality inferences, increasing our depth of understanding (ie, details and frequencies). In terms of time, the NLP-only approach was quicker than the qualitative-only approach for the drug (120 vs 270 minutes) and police (40 vs 270 minutes) questions. An approach beginning with qualitative analysis followed by qualitative- or NLP-augmented analysis took longer time than that beginning with NLP for both drug (450 vs 240 minutes) and police (390 vs 220 minutes) questions. CONCLUSIONS: NLP provides both a foundation to code qualitatively more quickly and a method to validate qualitative findings. NLP methods were able to identify major themes found with traditional qualitative analysis but were not useful in identifying nuances. Traditional qualitative text analysis added important details and context.

Assuntos

Processamento de Linguagem Natural , Envio de Mensagens de Texto/instrumentação , Humanos

14.

Converting Lab Report Files into Usable Data.

Wesley, David.

J Insur Med ; 45(2): 98-102, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-27584845

RESUMO

Medical directors may be asked to analyze their company's experienced laboratory results. This practical research note uses the example of trying to help predict the distribution of proposed insured's under a new preferred risk program as a way to illustrate how to marshal a company's lab results into a suitable dataset for analysis.

15.

Advanced Sampling Technique in Radiology Free-Text Data for Efficiently Building Text Mining Models by Deep Learning in Vertebral Fracture.

Hung, Wei-Chieh; Lin, Yih-Lon; Lin, Chi-Wei; Chin, Wei-Leng; Wu, Chih-Hsing.

Diagnostics (Basel) ; 14(2)2024 Jan 08.

Artigo em Inglês | MEDLINE | ID: mdl-38248014

RESUMO

This study aims to establish advanced sampling methods in free-text data for efficiently building semantic text mining models using deep learning, such as identifying vertebral compression fracture (VCF) in radiology reports. We enrolled a total of 27,401 radiology free-text reports of X-ray examinations of the spine. The predictive effects were compared between text mining models built using supervised long short-term memory networks, independently derived by four sampling methods: vector sum minimization, vector sum maximization, stratified, and simple random sampling, using four fixed percentages. The drawn samples were applied to the training set, and the remaining samples were used to validate each group using different sampling methods and ratios. The predictive accuracy was measured using the area under the receiver operating characteristics (AUROC) to identify VCF. At the sampling ratios of 1/10, 1/20, 1/30, and 1/40, the highest AUROC was revealed in the sampling methods of vector sum minimization as confidence intervals of 0.981 (95%CIs: 0.980-0.983)/0.963 (95%CIs: 0.961-0.965)/0.907 (95%CIs: 0.904-0.911)/0.895 (95%CIs: 0.891-0.899), respectively. The lowest AUROC was demonstrated in the vector sum maximization. This study proposes an advanced sampling method, vector sum minimization, in free-text data that can be efficiently applied to build the text mining models by smartly drawing a small amount of critical representative samples.

16.

Effects of Language Differences on Inpatient Fall Detection Using Deep Learning.

Cho, Insook; Lee, EunJu; Lee, Dong-Geon.

Stud Health Technol Inform ; 310: 1584-1585, 2024 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-38426881

RESUMO

This study examined the effects of language differences between Korean and English on the performance of natural language processing in the classification task of identifying inpatient falls from unstructured nursing notes.

Assuntos

Aprendizado Profundo , Humanos , Acidentes por Quedas/prevenção & controle , Pacientes Internados , Registros Eletrônicos de Saúde , Idioma , Processamento de Linguagem Natural

17.

Chronic neck and low back pain from personal experiences: a written narrative approach.

Sora, Beatriz; Nieto, Rubén; Vall-Roqué, Helena; Conesa, Jordi; Pérez-Navarro, Antoni; Saigí-Rubió, Francesc.

Pain Manag ; 14(4): 183-194, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38717373

RESUMO

Background: Chronic neck and low back pain are very common and have detrimental effects for people and society. In this study, we explore the experiences of individuals with neck and/or back pain using a written narrative methodology. Materials & methods: A total of 92 individuals explained their pain experience using written narratives. Narratives were analyzed through thematic analysis and text data mining. Results: Participants wrote about their experience in terms of pain characteristics, diagnosis process, pain consequences, coping strategies, pain triggers, well-being and future expectations. Text data mining allowed us to identify concurrent networks that were basically related with pain characteristics, management and triggers. Conclusion: Written narratives are useful to understand individuals' experiences from their point of view.

[Box: see text].

Assuntos

Dor Crônica , Dor Lombar , Narração , Cervicalgia , Humanos , Dor Lombar/psicologia , Dor Lombar/terapia , Dor Lombar/diagnóstico , Masculino , Feminino , Dor Crônica/psicologia , Dor Crônica/terapia , Dor Crônica/diagnóstico , Cervicalgia/psicologia , Cervicalgia/terapia , Cervicalgia/diagnóstico , Adulto , Pessoa de Meia-Idade , Adaptação Psicológica , Idoso , Adulto Jovem , Pesquisa Qualitativa

18.

Likelihood ratios for changepoints in categorical event data with applications in digital forensics.

Longjohn, Rachel; Smyth, Padhraic.

J Forensic Sci ; 69(4): 1289-1303, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38558223

RESUMO

We investigate likelihood ratio models motivated by digital forensics problems involving time-stamped user-generated event data from a device or account. Of specific interest are scenarios where the data may have been generated by a single individual (the device/account owner) or by two different individuals (the device/account owner and someone else), such as instances in which an account was hacked or a device was stolen before being associated with a crime. Existing likelihood ratio methods in this context require that a precise time is specified at which the device or account is purported to have changed hands (the changepoint)-this is the known changepoint likelihood ratio model. In this paper, we develop a likelihood ratio model that instead accommodates uncertainty in the changepoint using Bayesian techniques, that is, an unknown changepoint likelihood ratio model. We show that the likelihood ratio in this case can be calculated in closed form as an expression that is straightforward to compute. In experiments with simulated changepoints using real-world data sets, the results demonstrate that the unknown changepoint model attains comparable performance to the known changepoint model that uses a perfectly specified changepoint, and considerably outperforms the known changepoint model that uses a misspecified changepoint, illustrating the benefit of capturing uncertainty in the changepoint.

19.

Statistical clustering of documents via stochastic blockmodels.

Atandoh, Paul H; Lee, Kevin H.

J Appl Stat ; 51(10): 1878-1893, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39071253

RESUMO

As the online market grows rapidly, people are relying more on product review when they purchase the product. Hence, many companies and researchers are interested in analyzing product review which essentially a text data. In the current literature, it is common to use only text analysis tools to analyze text dataset. But in our work, we propose a method that utilizes both text analysis method such as topic modeling and statistical network model to build network among individuals and find interesting communities. We introduce a promising framework that incorporates topic modeling technique to define the edges among the individuals and form a network and uses stochastic blockmodels (SBM) to find the communities. The power of our proposed method is demonstrated in real-world application to Amazon product review dataset.

20.

Can anonymous posters on medical forums be reidentified?

Bobicev, Victoria; Sokolova, Marina; El Emam, Khaled; Jafer, Yasser; Dewar, Brian; Jonker, Elizabeth; Matwin, Stan.

J Med Internet Res ; 15(10): e215, 2013 Oct 03.

Artigo em Inglês | MEDLINE | ID: mdl-24091380

RESUMO

BACKGROUND: Participants in medical forums often reveal personal health information about themselves in their online postings. To feel comfortable revealing sensitive personal health information, some participants may hide their identity by posting anonymously. They can do this by using fake identities, nicknames, or pseudonyms that cannot readily be traced back to them. However, individual writing styles have unique features and it may be possible to determine the true identity of an anonymous user through author attribution analysis. Although there has been previous work on the authorship attribution problem, there has been a dearth of research on automated authorship attribution on medical forums. The focus of the paper is to demonstrate that character-based author attribution works better than word-based methods in medical forums. OBJECTIVE: The goal was to build a system that accurately attributes authorship of messages posted on medical forums. The Authorship Attributor system uses text analysis techniques to crawl medical forums and automatically correlate messages written by the same authors. Authorship Attributor processes unstructured texts regardless of the document type, context, and content. METHODS: The messages were labeled by nicknames of the forum participants. We evaluated the system's performance through its accuracy on 6000 messages gathered from 2 medical forums on an in vitro fertilization (IVF) support website. RESULTS: Given 2 lists of candidate authors (30 and 50 candidates, respectively), we obtained an F score accuracy in detecting authors of 75% to 80% on messages containing 100 to 150 words on average, and 97.9% on longer messages containing at least 300 words. CONCLUSIONS: Authorship can be successfully detected in short free-form messages posted on medical forums. This raises a concern about the meaningfulness of anonymous posting on such medical forums. Authorship attribution tools can be used to warn consumers wishing to post anonymously about the likelihood of their identity being determined.

Assuntos

Autoria , Confidencialidade , Internet , Humanos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA