Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Deep learning-based idiomatic expression recognition for the Amharic language.

Endalie, Demeke; Haile, Getamesay; Taye, Wondmagegn.

PLoS One ; 18(12): e0295339, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38096324

RESUMO

Idiomatic expressions are built into all languages and are common in ordinary conversation. Idioms are difficult to understand because they cannot be deduced directly from the source word. Previous studies reported that idiomatic expression affects many Natural language processing tasks in the Amharic language. However, most natural language processing models used with the Amharic language, such as machine translation, semantic analysis, sentiment analysis, information retrieval, question answering, and next-word prediction, do not consider idiomatic expressions. As a result, in this paper, we proposed a convolutional neural network (CNN) with a FastText embedding model for detecting idioms in an Amharic text. We collected 1700 idiomatic and 1600 non-idiomatic expressions from Amharic books to test the proposed model's performance. The proposed model is then evaluated using this dataset. We employed an 80 by 10,10 splitting ratio to train, validate, and test the proposed idiomatic recognition model. The proposed model's learning accuracy across the training dataset is 98%, and the model achieves 80% accuracy on the testing dataset. We compared the proposed model to machine learning models like K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Random Forest classifiers. According to the experimental results, the proposed model produces promising results.

Assuntos

Aprendizado Profundo , Idioma , Semântica , Redes Neurais de Computação , Aprendizado de Máquina

Analysis of lung cancer risk factors from medical records in Ethiopia using machine learning.

Endalie, Demeke; Abebe, Wondmagegn Taye.

PLOS Digit Health ; 2(7): e0000308, 2023 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-37467222

RESUMO

Cancer is a broad term that refers to a wide range of diseases that can affect any part of the human body. To minimize the number of cancer deaths and to prepare an appropriate health policy on cancer spread mitigation, scientifically supported knowledge of cancer causes is critical. As a result, in this study, we analyzed lung cancer risk factors that lead to a highly severe cancer case using a decision tree-based ranking algorithm. This feature relevance ranking algorithm computes the weight of each feature of the dataset by using split points to improve detection accuracy, and each risk factor is weighted based on the number of observations that occur for it on the decision tree. Coughing of blood, air pollution, and obesity are the most severe lung cancer risk factors out of nine, with a weight of 39%, 21%, and 14%, respectively. We also proposed a machine learning model that uses Extreme Gradient Boosting (XGBoost) to detect lung cancer severity levels in lung cancer patients. We used a dataset of 1000 lung cancer patients and 465 individuals free from lung cancer from Tikur Ambesa (Black Lion) Hospital in Addis Ababa, Ethiopia, to assess the performance of the proposed model. The proposed cancer severity level detection model achieved 98.9%, 99%, and 98.9% accuracy, precision, and recall, respectively, for the testing dataset. The findings can assist governments and non-governmental organizations in making lung cancer-related policy decisions.

Bi-directional long short term memory-gated recurrent unit model for Amharic next word prediction.

Endalie, Demeke; Haile, Getamesay; Taye, Wondmagegn.

PLoS One ; 17(8): e0273156, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35980997

RESUMO

The next word prediction is useful for the users and helps them to write more accurately and quickly. Next word prediction is vital for the Amharic Language since different characters can be written by pressing the same consonants along with different vowels, combinations of vowels, and special keys. As a result, we present a Bi-directional Long Short Term-Gated Recurrent Unit (BLST-GRU) network model for the prediction of the next word for the Amharic Language. We evaluate the proposed network model with 63,300 Amharic sentence and produces 78.6% accuracy. In addition, we have compared the proposed model with state-of-the-art models such as LSTM, GRU, and BLSTM. The experimental result shows, that the proposed network model produces a promising result.

Assuntos

Idioma , Redes Neurais de Computação , Memória de Longo Prazo , Memória de Curto Prazo

Feature selection by integrating document frequency with genetic algorithm for Amharic news document classification.

Endalie, Demeke; Haile, Getamesay; Taye Abebe, Wondmagegn.

PeerJ Comput Sci ; 8: e961, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35634124

RESUMO

Text classification is the process of categorizing documents based on their content into a predefined set of categories. Text classification algorithms typically represent documents as collections of words and it deals with a large number of features. The selection of appropriate features becomes important when the initial feature set is quite large. In this paper, we present a hybrid of document frequency (DF) and genetic algorithm (GA)-based feature selection method for Amharic text classification. We evaluate this feature selection method on Amharic news documents obtained from the Ethiopian News Agency (ENA). The number of categories used in this study is 13. Our experimental results showed that the proposed feature selection method outperformed other feature selection methods utilized for Amharic news document classification. Combining the proposed feature selection method with Extra Tree Classifier (ETC) improves classification accuracy. It improves classification accuracy up to 1% higher than the hybrid of DF, information gain (IG), chi-square (CHI), and principal component analysis (PCA), 2.47% greater than GA and 3.86% greater than a hybrid of DF, IG, and CHI.

Automated Amharic News Categorization Using Deep Learning Models.

Endalie, Demeke; Haile, Getamesay.

Comput Intell Neurosci ; 2021: 3774607, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34354742

RESUMO

For decades, machine learning techniques have been used to process Amharic texts. The potential application of deep learning on Amharic document classification has not been exploited due to a lack of language resources. In this paper, we present a deep learning model for Amharic news document classification. The proposed model uses fastText to generate text vectors to represent semantic meaning of texts and solve the problem of traditional methods. The text vectors matrix is then fed into the embedding layer of a convolutional neural network (CNN), which automatically extracts features. We conduct experiments on a data set with six news categories, and our approach produced a classification accuracy of 93.79%. We compared our method to well-known machine learning algorithms such as support vector machine (SVM), multilayer perceptron (MLP), decision tree (DT), XGBoost (XGB), and random forest (RF) and achieved good results.

Assuntos

Aprendizado Profundo , Idioma , Aprendizado de Máquina , Redes Neurais de Computação , Máquina de Vetores de Suporte

Designing a hybrid dimension reduction for improving the performance of Amharic news document classification.

Endalie, Demeke; Tegegne, Tesfa.

PLoS One ; 16(5): e0251902, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34019571

RESUMO

The volume of Amharic digital documents has grown rapidly in recent years. As a result, automatic document categorization is highly essential. In this paper, we present a novel dimension reduction approach for improving classification accuracy by combining feature selection and feature extraction. The new dimension reduction method utilizes Information Gain (IG), Chi-square test (CHI), and Document Frequency (DF) to select important features and Principal Component Analysis (PCA) to refine the features that have been selected. We evaluate the proposed dimension reduction method with a dataset containing 9 news categories. Our experimental results verified that the proposed dimension reduction method outperforms other methods. Classification accuracy with the new dimension reduction is 92.60%, which is 13.48%, 16.51% and 10.19% higher than with IG, CHI, and DF respectively. Further work is required since classification accuracy still decreases as we reduce the feature size to save computational time.

Assuntos

Mineração de Dados/métodos , Tecnologia da Informação , Linguística/estatística & dados numéricos , Redução Dimensional com Múltiplos Fatores/estatística & dados numéricos , Máquina de Vetores de Suporte , Conjuntos de Dados como Assunto , Etiópia , Humanos , Idioma , Análise de Componente Principal

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA