Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
BMC Med Inform Decis Mak ; 24(1): 221, 2024 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-39103849

RESUMEN

Performing data augmentation in medical named entity recognition (NER) is crucial due to the unique challenges posed by this field. Medical data is characterized by high acquisition costs, specialized terminology, imbalanced distributions, and limited training resources. These factors make achieving high performance in medical NER particularly difficult. Data augmentation methods help to mitigate these issues by generating additional training samples, thus balancing data distribution, enriching the training dataset, and improving model generalization. This paper proposes two data augmentation methods-Contextual Random Replacement based on Word2Vec Augmentation (CRR) and Targeted Entity Random Replacement Augmentation (TER)-aimed at addressing the scarcity and imbalance of data in the medical domain. When combined with a deep learning-based Chinese NER model, these methods can significantly enhance performance and recognition accuracy under limited resources. Experimental results demonstrate that both augmentation methods effectively improve the recognition capability of medical named entities. Specifically, the BERT-BiLSTM-CRF model achieved the highest F1 score of 83.587%, representing a 1.49% increase over the baseline model. This validates the importance and effectiveness of data augmentation in medical NER.


Asunto(s)
Aprendizaje Profundo , Humanos , Procesamiento de Lenguaje Natural
2.
Sensors (Basel) ; 23(20)2023 Oct 12.
Artículo en Inglés | MEDLINE | ID: mdl-37896519

RESUMEN

The explosive growth of online short videos has brought great challenges to the efficient management of video content classification, retrieval, and recommendation. Video features for video management can be extracted from video image frames by various algorithms, and they have been proven to be effective in the video classification of sensor systems. However, frame-by-frame processing of video image frames not only requires huge computing power, but also classification algorithms based on a single modality of video features cannot meet the accuracy requirements in specific scenarios. In response to these concerns, we introduce a short video categorization architecture centered around cross-modal fusion in visual sensor systems which jointly utilizes video features and text features to classify short videos, avoiding processing a large number of image frames during classification. Firstly, the image space is extended to three-dimensional space-time by a self-attention mechanism, and a series of patches are extracted from a single image frame. Each patch is linearly mapped into the embedding layer of the Timesformer network and augmented with positional information to extract video features. Second, the text features of subtitles are extracted through the bidirectional encoder representation from the Transformers (BERT) pre-training model. Finally, cross-modal fusion is performed based on the extracted video and text features, resulting in improved accuracy for short video classification tasks. The outcomes of our experiments showcase a substantial superiority of our introduced classification framework compared to alternative baseline video classification methodologies. This framework can be applied in sensor systems for potential video classification.

3.
Sensors (Basel) ; 23(10)2023 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-37430701

RESUMEN

The rise in the use of social media networks has increased the prevalence of cyberbullying, and time is paramount to reduce the negative effects that derive from those behaviours on any social media platform. This paper aims to study the early detection problem from a general perspective by carrying out experiments over two independent datasets (Instagram and Vine), exclusively using users' comments. We used textual information from comments over baseline early detection models (fixed, threshold, and dual models) to apply three different methods of improving early detection. First, we evaluated the performance of Doc2Vec features. Finally, we also presented multiple instance learning (MIL) on early detection models and we assessed its performance. We applied timeawareprecision (TaP) as an early detection metric to asses the performance of the presented methods. We conclude that the inclusion of Doc2Vec features improves the performance of baseline early detection models by up to 79.6%. Moreover, multiple instance learning shows an important positive effect for the Vine dataset, where smaller post sizes and less use of the English language are present, with a further improvement of up to 13%, but no significant enhancement is shown for the Instagram dataset.

4.
BMC Bioinformatics ; 23(1): 135, 2022 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-35428172

RESUMEN

BACKGROUND: Long non-coding RNA (LncRNA) plays important roles in physiological and pathological processes. Identifying LncRNA-protein interactions (LPIs) is essential to understand the molecular mechanism and infer the functions of lncRNAs. With the overwhelming size of the biomedical literature, extracting LPIs directly from the biomedical literature is essential, promising and challenging. However, there is no webserver of LPIs relationship extraction from literature. RESULTS: LPInsider is developed as the first webserver for extracting LPIs from biomedical literature texts based on multiple text features (semantic word vectors, syntactic structure vectors, distance vectors, and part of speech vectors) and logistic regression. LPInsider allows researchers to extract LPIs by uploading PMID, PMCID, PMID List, or biomedical text. A manually filtered and highly reliable LPI corpus is integrated in LPInsider. The performance of LPInsider is optimal by comprehensive experiment on different combinations of different feature and machine learning models. CONCLUSIONS: LPInsider is an efficient analytical tool for LPIs that helps researchers to enhance their comprehension of lncRNAs from text mining, and also saving their time. In addition, LPInsider is freely accessible from http://www.csbg-jlu.info/LPInsider/ with no login requirement. The source code and LPIs corpus can be downloaded from https://github.com/qiufengdiewu/LPInsider .


Asunto(s)
ARN Largo no Codificante , Biología Computacional , Minería de Datos , Aprendizaje Automático , ARN Largo no Codificante/genética , Programas Informáticos
5.
Read Teach ; 74(3): 243-253, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33362300

RESUMEN

Across the globe, students have been away from schools and their teachers, but literacy learning has continued. In many countries, students' literacy proficiency is often measured via high-stakes assessment tests. However, such tests do not make visible students' literacy lives away from formal learning settings, so students are positioned as task responders, rather than as agentive readers and writers. The authors explore the fluidity and diversity of literacy events and practices for students and their teachers observed during the recent period of COVID-19 lockdown restrictions.

6.
Heliyon ; 10(16): e35812, 2024 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-39247283

RESUMEN

Video content on the web platform has increased explosively during the past decade, thanks to the open access to Facebook, YouTube, etc. YouTube is the second-largest social media platform nowadays containing more than 37 million YouTube channels. YouTube revealed at a recent press event that 30,000 new content videos per hour and 720,000 per day are posted. There is a need for an advanced deep learning-based approach to categorize the huge database of YouTube videos. This study aims to develop an artificial intelligence-based approach to categorize YouTube videos. This study analyzes the textual information related to videos like titles, descriptions, user tags, etc. using YouTube exploratory data analysis (YEDA) and shows that such information can be potentially used to categorize videos. A deep convolutional neural network (DCNN) is designed to categorize YouTube videos with efficiency and high accuracy. In addition, recurrent neural network (RNN), and gated recurrent unit (GRU) are also employed for performance comparison. Moreover, logistic regression, support vector machines, decision trees, and random forest models are also used. A large dataset with 9 classes is used for experiments. Experimental findings indicate that the proposed DCNN achieves the highest receiver operating characteristics (ROC) area under the curve (AUC) score of 99% in the context of YouTube video categorization and 96% accuracy which is better than existing approaches. The proposed approach can be used to help YouTube users suggest relevant videos and sort them by video category.

7.
Artículo en Inglés | MEDLINE | ID: mdl-36901240

RESUMEN

This study examines the impact of environmental information disclosure quality on firm value for Chinese listed companies in heavily polluting industries from 2010 to 2021. By controlling for the level of leverage, growth, and corporate governance, a fixed effects model is constructed to test this relationship. Furthermore, this study analyzes the moderating effects of annual report text features, such as length, similarity, and readability, on the relationship between environmental information disclosure and firm value and the heterogeneous impact of firm ownership on this relationship. The main findings of this study are as follows: There is a positive correlation between the level of environmental information disclosure and firm value for Chinese listed companies in heavily polluting industries. Annual report text length and readability positively moderate the relationship between environmental information disclosure and firm value. Annual report text similarity negatively moderates the relationship between environmental information disclosure and firm value performance. Compared with state-owned enterprises, the impact of environmental information disclosure quality on the firm value of no-state-owned enterprises is more significant.


Asunto(s)
Revelación , Industrias , China , Organizaciones , Propiedad
8.
PeerJ Comput Sci ; 9: e1736, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38192453

RESUMEN

Because many existing algorithms are mainly trained based on the structural features of the networks, the results are more inclined to the structural commonality of the networks. These algorithms ignore the rich external information and node attributes (such as node text content, community and labels, etc.) that have important implications for network data analysis tasks. Existing network embedding algorithms considering text features usually regard the co-occurrence words in the node's text, or use an induced matrix completion algorithm to factorize the text feature matrix or the network structure feature matrix. Although this kind of algorithm can greatly improve the network embedding performance, they ignore the contribution rate of different co-occurrence words in the node's text. This article proposes a network embedding learning algorithm combining network structure and co-occurrence word features, also incorporating an attention mechanism to model the weight information of the co-occurrence words in the model. This mechanism filters out unimportant words and focuses on important words for learning and training tasks, fully considering the impact of the different co-occurrence words to the model. The proposed network representation algorithm is tested on three open datasets, and the experimental results demonstrate its strong advantages in node classification, visualization analysis, and case analysis tasks.

9.
Procedia Comput Sci ; 219: 1509-1517, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37205132

RESUMEN

Health literacy is the ability to understand, process, and obtain health information and make suitable decisions about health care [3]. Traditionally, text has been the main medium for delivering health information. However, virtual assistants are gaining popularity in this digital era; and people increasingly rely on audio and smart speakers for health information. We aim to identify audio/text features that contribute to the difficulty of the information delivered over audio. We are creating a health-related audio corpus. We selected text snippets and calculated seven text features. Then, we converted the text snippets to audio snippets. In a pilot study with Amazon Mechanical Turk (AMT) workers, we measured the perceived and actual difficulty of the audio using the response of multiple choice and free recall questions. We collected demographic information as well as bias about doctors' gender, task preference, and health information preference. Thirteen workers completed thirty audio snippets and related questions. We found a strong correlation between text features lexical chain, and the dependent variables, and multiple choice response, percentage of matching word, percentage of similar word, cosine similarity, and time taken (in seconds). In addition, doctors were generally perceived to be more competent than warm. How warm workers perceive male doctors correlated significantly with perceived difficulty.

10.
Soc Netw Anal Min ; 12(1): 47, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35378818

RESUMEN

Information is spread as individuals engage with other users in the underlying social network. Analysis of social engagements can therefore provide insights to understand the motivation behind how and why users engage with others in different activities. In this study, we aim to understand the driving factors behind four engagement types in Twitter, namely like, reply, retweet, and quote. We extensively analyze a diverse set of features that reflect user behaviors, as well as tweet attributes and semantics by natural language processing, including a deep learning language model, BERT. The performance of these features is assessed in a supervised task of engagement prediction by learning social engagements from over 14 million multilingual tweets. In the light of our experimental results, we find that users would engage with tweets based on text semantics and contents regardless of tweet author, yet popular and trusted authors could be important for reply and quote. Users who actively liked and retweeted in the past are likely to maintain this type of behavior in the future, while this trend is not seen in more complex types of engagements, reply, and quote. Moreover, users do not necessarily follow the behavior of other users with whom they have previously engaged. We further discuss the social insights obtained from the experimental results to understand better user behavior and social engagements in online social networks. Supplementary Information: The online version contains supplementary material available at 10.1007/s13278-022-00872-1.

11.
Sensors (Basel) ; 10(5): 5263-79, 2010.
Artículo en Inglés | MEDLINE | ID: mdl-22399932

RESUMEN

Text line segmentation is an essential stage in off-line optical character recognition (OCR) systems. It is a key because inaccurately segmented text lines will lead to OCR failure. Text line segmentation of handwritten documents is a complex and diverse problem, complicated by the nature of handwriting. Hence, text line segmentation is a leading challenge in handwritten document image processing. Due to inconsistencies in measurement and evaluation of text segmentation algorithm quality, some basic set of measurement methods is required. Currently, there is no commonly accepted one and all algorithm evaluation is custom oriented. In this paper, a basic test framework for the evaluation of text feature extraction algorithms is proposed. This test framework consists of a few experiments primarily linked to text line segmentation, skew rate and reference text line evaluation. Although they are mutually independent, the results obtained are strongly cross linked. In the end, its suitability for different types of letters and languages as well as its adaptability are its main advantages. Thus, the paper presents an efficient evaluation method for text analysis algorithms.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA