RESUMO
BACKGROUND: The Posttraumatic Stress Disorder Checklist (PCL-5) is the most widely used screening tool in assessing posttraumatic stress disorder symptoms, based on the Diagnostic and Statistical Manual of Mental disorders (DSM-5) criteria. This study aimed to evaluate the psychometric properties of the newly translated Bangla PCL-5. METHODS: A cross-sectional survey was carried out among 10,605 individuals (61.0% male; mean age: 23.6 ± 5.5 [13-71 years]) during May and June 2020, several months after the onset of the COVID-19 outbreak in Bangladesh. The survey included the Bangla PCL-5 and the PHQ-9 depression scale. We used confirmatory factor analysis to test the four-factor DSM-5 model, the six-factor Anhedonia model, and the seven-factor hybrid model. RESULTS: The Bangla PCL-5 displayed adequate internal consistency (Cronbach's alpha = 0.90). The Bangla PCL-5 score was significantly correlated with scores of the PHQ-9 depression scale, confirming strong convergent validity. Confirmatory factor analyses indicated the models had a good fit to the data, including the four-factor DSM-5 model, the six-factor Anhedonia model, and the seven-factor hybrid model. Overall, the seven-factor hybrid model exhibited the best fit to the data. CONCLUSIONS: The Bangla PCL-5 appears to be a valid and reliable psychometric screening tool that may be employed in the prospective evaluation of posttraumatic stress disorder in Bangladesh.
Assuntos
COVID-19 , Transtornos de Estresse Pós-Traumáticos , Adolescente , Adulto , Anedonia , Lista de Checagem , Estudos Transversais , Manual Diagnóstico e Estatístico de Transtornos Mentais , Feminino , Humanos , Masculino , Psicometria , Reprodutibilidade dos Testes , Transtornos de Estresse Pós-Traumáticos/diagnóstico , Adulto JovemRESUMO
The growth of the Internet has expanded the amount of data expressed by users across multiple platforms. The availability of these different worldviews and individuals' emotions empowers sentiment analysis. However, sentiment analysis becomes even more challenging due to a scarcity of standardized labeled data in the Bangla NLP domain. The majority of the existing Bangla research has relied on models of deep learning that significantly focus on context-independent word embeddings, such as Word2Vec, GloVe, and fastText, in which each word has a fixed representation irrespective of its context. Meanwhile, context-based pre-trained language models such as BERT have recently revolutionized the state of natural language processing. In this work, we utilized BERT's transfer learning ability to a deep integrated model CNN-BiLSTM for enhanced performance of decision-making in sentiment analysis. In addition, we also introduced the ability of transfer learning to classical machine learning algorithms for the performance comparison of CNN-BiLSTM. Additionally, we explore various word embedding techniques, such as Word2Vec, GloVe, and fastText, and compare their performance to the BERT transfer learning strategy. As a result, we have shown a state-of-the-art binary classification performance for Bangla sentiment analysis that significantly outperforms all embedding and algorithms.
Assuntos
Processamento de Linguagem Natural , Análise de Sentimentos , Algoritmos , Humanos , Idioma , Aprendizado de MáquinaRESUMO
A real-time Bangla Sign Language interpreter can enable more than 200 k hearing and speech-impaired people to the mainstream workforce in Bangladesh. Bangla Sign Language (BdSL) recognition and detection is a challenging topic in computer vision and deep learning research because sign language recognition accuracy may vary on the skin tone, hand orientation, and background. This research has used deep machine learning models for accurate and reliable BdSL Alphabets and Numerals using two well-suited and robust datasets. The dataset prepared in this study comprises of the largest image database for BdSL Alphabets and Numerals in order to reduce inter-class similarity while dealing with diverse image data, which comprises various backgrounds and skin tones. The papers compared classification with and without background images to determine the best working model for BdSL Alphabets and Numerals interpretation. The CNN model trained with the images that had a background was found to be more effective than without background. The hand detection portion in the segmentation approach must be more accurate in the hand detection process to boost the overall accuracy in the sign recognition. It was found that ResNet18 performed best with 99.99% accuracy, precision, F1 score, sensitivity, and 100% specificity, which outperforms the works in the literature for BdSL Alphabets and Numerals recognition. This dataset is made publicly available for researchers to support and encourage further research on Bangla Sign Language Interpretation so that the hearing and speech-impaired individuals can benefit from this research.
Assuntos
Aprendizado Profundo , Língua de Sinais , Mãos , Humanos , Aprendizado de Máquina , Redes Neurais de ComputaçãoRESUMO
We aimed to assess how suicidality has been depicted in Bangla movies and dramas. We conducted a search on YouTube by using search terms to identify movies and dramas with suicidal scripts. The search was performed between February and May 2022 resulting in 71 items consisting of 35 Bangla movies and 36 Bangla dramas. We scrutinized the contents of movies and dramas against our pre-designed instrument and we assessed their quality against World Health Organization guidelines. Among the 71 suicidal behaviors, 46.5% were suicides, 72% of the suicidal behavior was noted in young adults, 63.9% were unmarried, and 69% attempts were found in prominent characters. Hanging was found as the most prominent method (25.4%) and premarital and extramarital affairs and sexual harassment were the most prominent risk factors (60.6%). The potentially harmful characteristics were present in almost all events whereas potentially helpful contents were mentioned very minimally.
RESUMO
BACKGROUND: A comprehensive aphasia assessment is necessary to diagnose the type and severity of aphasia differentially and guide appropriate interventions. One component of an aphasia assessment is the picture description task (PDT), designed to probe spontaneous speech fluency and information content. Most aphasia assessments use black-and-white line drawings (LD) to elicit spontaneous language samples from people with aphasia (PWA). However, recent studies reported two visuographic variables: (1) colour (over black and white) and (2) photograph (over LD), that tended to encourage easier and faster comprehension and increased overall naturalistic language production from neurologically healthy individuals as well as PWA. Additionally, a suitable stimulus for a PDT should always be culturally relevant to the target population. Therefore, we suggest that a new PDT must include a culturally appropriate colour photograph (CP). AIMS: To investigate if a culturally appropriate CP elicits longer and more complex utterances than a culturally appropriate black-and-white LD from neurologically healthy native Bangla speakers. METHODS & PROCEDURES: A total of 30 participants (mean age = 36.03 years) were recruited based on self-reports of no known impairments in cognition, language, vision and hearing. All were of middle socioeconomic status with at least 12 years of formal education. A culturally appropriate CP was selected showing multiple characters performing various functions. Later, an artist prepared the black-and-white LD of that CP. The elicited language samples using these two pictures were transcribed and coded following preset transcription and coding guidelines. The transcribed samples were further analysed using the Bangla adaptation of Systematic Analysis of Language Transcripts (SALT) software. To identify the differences in language production between these two picture types, investigators used four measurement variables: mean length of utterances (MLU), complexity index (CI), total number of words (TNW) and words per minute (WPM). OUTCOMES & RESULTS: Of the four measures, only MLU showed a statistically significant difference between the CP and the black-and-white LD. CI demonstrated a strong correlation with MLU for the CP, which indicates that the participants who produced higher MLU for the CP also produced a higher CI for the CP. There were no significant differences between the two picture types for CI, TNW and WPM. CONCLUSIONS & IMPLICATIONS: This study found that the grammatical complexity, as measured by MLU, of spontaneous language production of neurologically healthy adults was higher when a CP was used in a PDT. A CP may also be beneficial for PWA to produce complex language samples. What this paper adds What is already known on the subject There are studies on neurologically healthy individuals as well as on PWA that identified the impact of using different visuographic variables (colour and photograph) separately, which enhanced the picture comprehension and improved performances on associated language production tasks. To our knowledge, no studies have identified the combined impact of these two visuographic variables on spontaneous language production. Therefore, this initial study on neurologically healthy Bangla adults reports the impact of using a CP as a stimulus item for a PDT task to elicit spontaneous language samples. What this paper adds to existing knowledge This study reports that using a culturally appropriate CP for a PDT enhances the grammatical complexity of spontaneous language production of neurologically healthy adults. To our knowledge, this is the first study in Bangla that used the MLU as a measurement variable to analyse adults' spontaneous language production. What are the potential or actual clinical implications of this work? The development of future aphasia assessments should consider incorporating CPs as stimuli for PDTs, which may guide speech-language pathologists to provide accurate diagnoses for aphasia and related language disorders.
Assuntos
Afasia/diagnóstico , Testes de Linguagem , Estimulação Luminosa/métodos , Adulto , Arte , Bangladesh , Cor , Feminino , Voluntários Saudáveis , Humanos , Idioma , Linguística , Masculino , Fotografação , Comportamento VerbalRESUMO
BACKGROUND: Children with language disorder across languages have problems with verb morphology. The nature of these problems varies according to the typology of the language. The language analyzed in this paper is the Standard Bangla spoken in Dhaka, Bangladesh, by more than 200 million people. It is an underexplored language with agglutinative features in its verb inflections. Some information on the acquisition of the language by typically developing children is available, but to date we have no information on the nature of ALD. As in many places in the developing world, the circumstances for research into language disorder are challenging, as there is no well-ordered infrastructure for the identification of these children and approaches to intervention are not evidence based. This study represents the first attempt to characterize the nature of morphosyntactic limitations in standard Bangla-speaking children with language disorder. AIMS: To describe the performance of a group of children with language disorder on elicitation procedures for three Bangla verb inflections of increasing structural complexity-present simple, present progressive and past progressive-and to compare their abilities on these forms with those of a group of typically developing Bangla-speaking children. METHODS & PROCEDURES: Nine children with language disorder (mean age = 88.11 months) were recruited from a special school in Dhaka. Eight of the children also had a differentiating or co-occurring condition. They responded to three tasks: a semi-structured conversation to elicit present simple, and two picture-based tasks to elicit present progressive and past progressive. Their performance was compared with data available from a large group of younger typically developing children. OUTCOMES AND RESULTS: Group data indicated a comparable trajectory of performance by the children with language disorder with the typically developing children (present simple > present progressive > past progressive), but with significantly lower mean scores. Standard deviations suggested considerable individual variation and individual profiles were constructed for each child, revealing varying patterns of ability, some of which did not accord with the typical developmental trajectory and/or substitution patterns. CONCLUSIONS & IMPLICATIONS: This study identified verb morphology deficits in Bangla-speaking children with language disorder who had asociated conditions. Variation in performance among the children suggests that individual profiles will be most effective in guiding intervention.
Assuntos
Linguagem Infantil , Transtornos do Desenvolvimento da Linguagem/psicologia , Fatores Etários , Bangladesh , Criança , Comportamento Infantil , Pré-Escolar , Feminino , Humanos , Transtornos do Desenvolvimento da Linguagem/diagnóstico , MasculinoRESUMO
OBJECTIVES: In order to assist mental health services in developing countries, a key issue is the availability of psychometrically sound, brief, and cost-effective measures that have been tested within the relevant context. The present study was designed to evaluate within a young Bangladeshi population, the psychometric properties of two widely used Western measures of internalizing distress in young people: the short form of the Spence Children's Anxiety Scale and the Short Moods and Feelings Questionnaire. METHOD: The sample included 1,360 children and adolescents aged 9-17 years (M = 12.3 years, SD = 2.12) recruited from six districts of Bangladesh, including both community and emotionally at-risk participants. A total of 179 children were re-tested on the measures within 3-4 weeks. RESULTS: Confirmatory factor analyses showed single-factor structures for both scales in the total sample and in both community and at-risk participants separately. Multiple group analyses across gender and age-group within the at-risk and community samples showed that the single-factor structure was suitable regardless of subgroup. Analyses also indicated acceptable internal consistency, test-retest reliability and construct validity for both scales. CONCLUSION: The two measures show promise as brief, reliable, and valid instruments for the assessment of internalizing distress among young people from Bangla-speaking communities. PRACTITIONER POINTS: Positive clinical implications: These two measures of internalizing distress in young people showed solid psychometric properties within samples collected from various parts of Bangladesh. The measures can therefore be used to assess anxiety and depression in Bangla-speaking youth. These measures should be of value in both clinical settings and at a community level to assess the need for services. Cautions and limitations: Resource limitations did not allow comparison against diagnostic criteria and therefore cut-off scores to indicate clinical status among Bangladeshi youth will require further research.
Assuntos
Transtornos de Ansiedade/diagnóstico , Ansiedade/diagnóstico , Mecanismos de Defesa , Psicometria/estatística & dados numéricos , Inquéritos e Questionários , Adolescente , Afeto , Ansiedade/psicologia , Transtornos de Ansiedade/psicologia , Bangladesh , Criança , Depressão/psicologia , Transtorno Depressivo , Emoções , Análise Fatorial , Feminino , Humanos , Masculino , Serviços de Saúde Mental , Escalas de Graduação Psiquiátrica , Reprodutibilidade dos TestesRESUMO
Equation Recognition is a mathematical task of identifying equations, which has significance in developing different mathematical systems. In this paper, we introduce a novel Bangla mathematical equation dataset comprising 3430 observations aimed at advancing mathematical Equation Recognition in the Bangla language. To the best of our knowledge, no such dataset exists that was developed to recognize equations from the text. Each entry in the dataset includes a mathematical statement and the corresponding equation. This resource can significantly support research in mathematical Equation Recognition, including the identification of common mathematical operations (such as addition, subtraction, multiplication, division, and roots) and numerical values. With minor adjustments, researchers can also explore combinations of these findings. The dataset is raw and conveniently structured in CSV format, with two columns: "Text" and "Equation," facilitating easy handling for various deep learning and machine learning tasks.
RESUMO
Mathematical entity recognition is essential for machines to define and illustrate mathematical substance faultlessly and to facilitate sufficient mathematical operations and reasoning. As mathematical entity recognition in the Bangla language is novel, to our best knowledge, there is no available dataset exists in any repository. In this paper, we present state of the art Bangla mathematical entity dataset containing 13,717 observations. Each record has a mathematical statement, mathematical type and mathematical entity. This dataset can be utilized to conduct research involving the recognition of mathematical operators, renowned mathematical terms (such as complex numbers, real numbers, prime numbers, etc.), and operands as numbers. The findings mentioned above, and their combination are also feasible with a modest tweak to the dataset. Furthermore, we have structured this dataset in raw format and made a CSV file, incorporating three columns: text, math entity, and label. As an outcome, researchers may easily handle the data, facilitating a variety of deep learning and machine learning explorations.
RESUMO
Mathematical entity recognition is indispensable for machines to accurately explain and depict mathematical content and to enable adequate mathematical operations and reasoning. It expedites automated theorem proving, speeds up the analysis and retrieval of mathematical knowledge from documents, and improves e-learning and educational platforms. It also simplifies translation, scientific research, data analysis, interpretation, and the practical application of mathematical information. Mathematical entity recognition in the Bangla language is novel; to our best knowledge, no other similar works have been done. Here, we identify the mathematical operator, operands as numbers, and popular mathematical terms (complex numbers, real numbers, prime numbers, etc.). In this work, we recognize Bangla Mathematical Entity Recognition (MER) utilizing the ensemble architecture of deep neural networks known as Bidirectional Encoder Representations from Transformers (BERT). We prepare a novel dataset comprising 13,717 observations, each containing a mathematical statement, mathematical entity, and mathematical type. In our recognition process, we consider our proposed architectures using accuracy, precision, recall and f1-score as the performance metrics. The results have shown a satisfactory accuracy percentage of 97.98 with BERT and 99.76% with ensemble BERT.
RESUMO
Object recognition technology has made significant strides, but recognizing handwritten Bangla characters (including symbols, compound forms, etc.) remains a challenging problem due to the prevalence of cursive writing and many ambiguous characters. The complexity and variability of the Bangla script and individual's unique handwriting styles make it difficult to achieve satisfactory performance for practical applications, and the best existing recognizers are far less effective than those developed for English alpha-numeric characters. Compared to other major languages, there are limited options for recognizing handwritten Bangla characters. This research has described a new dataset to improve the accuracy and effectiveness of handwriting recognition systems for the Bengali language spoken by over 200 million people worldwide. This dataset aims to investigate and recognize Bangla handwritten characters, focusing on enlarging the recognized character classes. To achieve this, a new challenging dataset for handwriting recognition is introduced, collected from numerous students' handwriting from two institutions.
RESUMO
This study presents a large multi-modal Bangla YouTube clickbait dataset consisting of 253,070 data points collected through an automated process using the YouTube API and Python web automation frameworks. The dataset contains 18 diverse features categorized into metadata, primary content, engagement statistics, and labels for individual videos from 58 Bangla YouTube channels. A rigorous preprocessing step has been applied to denoise, deduplicate, and remove bias from the features, ensuring unbiased and reliable analysis. As the largest and most robust clickbait corpus in Bangla to date, this dataset provides significant value for natural language processing and data science researchers seeking to advance modeling of clickbait phenomena in low-resource languages. Its multi-modal nature allows for comprehensive analyses of clickbait across content, user interactions, and linguistic dimensions to develop more sophisticated detection methods with cross-linguistic applications.
RESUMO
The speech-to-song illusion is a phenomenon in which the continuous repetition of a spoken utterance induces the listeners to perceive it as more song-like. Thus far, this perceptual transformation has been observed in mostly European languages, such as English; however, it is unclear whether the illusion is experienced by speakers of Bangla (Bengali), an Indo-Aryan language. The current study, therefore, investigates the illusion in 28 Bangla- and 31 English-speaking participants. The experiment consisted of a listening task in which participants were asked to rate their perception of repeating short speech stimuli on a scale from 1-5, where 1 = "sounds like speech" and 5 = "sounds like song". The stimuli were composed of English and Bangla utterances produced by two bilingual speakers. To account for possible group differences in music engagement, participants self-reported musical experience and also performed a rhythm discrimination task as an objective measure of non-verbal auditory sequence processing. Stimulus ratings were analysed with cumulative link mixed modelling. Overall, English- and Bangla-speaking participants rated the stimuli similarly and, in both groups, better performance in the rhythm discrimination task significantly predicted more song-like ratings beyond self-reported musical experience. An exploratory acoustic analysis revealed a role of harmonic ratio in the illusion for both language groups. These results demonstrate that the speech-to-song-illusion occurs for Bangla speakers to a similar extent as English speakers and that, across both groups, sensitivity to non-verbal auditory structure is positively correlated with susceptibility to this perceptual transformation.
RESUMO
Background: Effective communication skill of physicians is an important component of high-quality healthcare delivery and safe patient care. Communication is embedded in the social and cultural contexts where it takes place. An understanding of medical students' attitudes and learning communication skills would help to design and deliver culturally appropriate medical education. The Communication Skills Attitude Scale (CSAS) is a widely used and validated tool to measure the attitude of medical students toward learning communication skills in different populations, settings, and countries. However, there is no culturally adapted and validated scale in Bangla in the Bangladesh context. This study aims to culturally adapt the CSAS into Bangla, and validate it in a cohort of medical students in Bangladesh. Methods: This study used a cross-sectional survey design to collect data from purposively selected 566 undergraduate medical students from the Rajshahi division. The survey was conducted from January to December 2023. Descriptive statistics like frequency distribution and measures of central tendency were used to measure perception regarding communication skills. The sample adequacy was measured through the Kaiser-Meyer-Olkin test. The internal consistency of the items was identified using Cronbach's alpha (α) coefficients. Result: The results of the study show that the Bangla version of the scale is feasible, valid, and internally consistent in the context of a developing country, Bangladesh. The overall internal consistency of the Bangla version is good since the value of Cronbach's alpha (α) is 0.882. For PAS, the internal consistency is 0.933. While, for NAS, the value is 0.719. The item-wise average scores in the PAS indicate that female medical students are more willing to learn communication skills compared with male students (α = 0.933). While, the scores in the NAS indicate that the male students tend to have more negative attitude toward learning communication skills compared with female students (α = 0.719). Conclusion: The CSAS-Bangla is a valid and reliable tool for assessing communication skill attitudes among Bangla speaking medical students. This scale can be used in future studies to measure the attitude of students, designing and evaluating communication skills training programs in medical colleges.
RESUMO
Background: The 6-item Female Sexual Function Index (FSFI-6) is the shortened version of the widely used 19-item FSFI-19, designed for efficient screening of female sexual dysfunction in outpatient settings. However, this shorter FSFI-6 tool has not yet been validated for use in Bangladesh. Aim: The purpose of this study was to culturally adapt and validate the FSFI-6 in Bangla. Methods: The FSFI-6 was translated into Bangla using standard adaptation protocols. We interviewed 100 married, sexually active women aged 18 years and over from the outpatient and psychiatric sex clinic of a psychiatry department. Of these women, 50 were clinically diagnosed with sexual disorders based on the Diagnostic and Statistical Manual of Mental Disorders, 5th edition, criteria. After obtaining written informed consent, participants completed a semi-structured questionnaire to provide sociodemographic information and the Bangla-adapted version of the FSFI-6. We assessed reliability and construct validity using the Statistical Package for Social Sciences, version 25, along with Classical and Bayesian Instrument Development software. Outcome: Study outcomes were internal consistency, factor structure, and sensitivity and specificity. Results: The study involved 100 participants with a mean ± SD age of 30 ± 5.4 years, ranging from 18 to 48 years. The majority of respondents (54.34%) reported issues related to sexual desire. The overall mean score on the Bangla-adapted FSFI-6 was 18.4 ± 5.4. Reliability analysis showed a high internal consistency, with a Cronbach's alpha of 0.887 indicating robust reliability. Both inter-item correlations and item-total correlations were within the acceptable range. A cutoff value of 19 for the FSFI-6 demonstrated high discriminative power, effectively distinguishing between individuals with sexual disorders and those without sexual disorders or with other psychiatric conditions. The sensitivity at this cutoff was 96%, with a specificity of 100%. Clinical Implications: The FSFI-6 Bangla version can be used to screen patients for female sexual dysfunction in an outpatient setting. Strengths and Limitations: The internal consistency of this study, indicated by a Cronbach's alpha of 0.887, was robust. The instrument is time efficient, user friendly, and well suited for outpatient settings. However, the sampling technique utilized was nonrandomized, confined to a single institution, and did not incorporate assessments for concurrent validity or test-retest reliability. Conclusion: The FSFI-6 Bangla version showed good reliability and validity in this study, supporting its usability as a valuable tool for screening sexual dysfunction in female.
RESUMO
Background Depression, anxiety, and stress are leading causes of disability worldwide and major contributors to suicide. The burden of these disorders among the Indian geriatric population is often described as a silent epidemic. The sudden emergence of the COVID-19 pandemic has only intensified this public health problem. Finding out factors associated with poor mental health is critical to improving overall healthcare for high-risk patients, especially in underserved and inaccessible communities. Aim This study was conducted to measure the prevalence rates of depression, anxiety, and stress and their sociodemographic correlates among the Indian geriatric patient population. This study also aimed to assess the coping strategies employed and difficulties faced by the population during the COVID-19 pandemic. Methods A cross-sectional survey was conducted using a pre-designed and pre-tested questionnaire. A total of 107 participants were recruited through convenience sampling. Depression, anxiety, and stress were measured using the Bangla version of the Depression, Anxiety, and Stress Scale (DASS-21 BV), a 21-item self-reported questionnaire. Results Of the sampled group, 43.9%, 32.7%, and 34.6% were moderately to extremely severely depressed, anxious, and stressed, respectively. Factors associated with worse mental health were increasing age, female gender, living separately from their spouses, unemployment, retirement, or any occupation that did not require one to leave their house. Of the sample population, 80.3% had experienced a loss of income due to the pandemic. The most frequently used coping strategy was to solve problems they faced daily, closely followed by praying and participating in religious activities. Conclusion Depression, anxiety, and stress showed a higher prevalence than previously described, before the pandemic. This could be due to the effects of the COVID-19 pandemic. Our study also demonstrated some of the factors associated with and the most commonly used ways to tackle poor mental health. Adequate educational awareness programmes that are accessible in different regional languages, strengthening mental health infrastructure, and community mental health services will significantly improve outcomes, especially among high-risk populations.
RESUMO
The popularity of reading comprehension (RC) is increasing day-to-day in Bangla Natural Language Processing (NLP) research area, both in machine learning and deep learning techniques. However, there is no original dataset from various sources in the Bangla language except translated from foreign RC datasets, which contain abnormalities and mismatched translated data. In his paper, we present UDDIPOK, a novel wide-ranging, open-domain Bangla reading comprehension dataset. This dataset contains 270 reading passages, 3636 questions, and answers from diverse origins, for instance, textbooks, exam questions from middle and high schools, newspapers, etc. Furthermore, this dataset is formated in CSV, which contains three columns: passages, questions, and answers. As a result, data can be handled expeditiously and easily for any machine learning research.
RESUMO
In spite of being the fifth most spoken native language in the world, Bangla has barely received any attention in the domain of audio and speech recognition. This article represents a speech dataset of Bengali Abusive Words with some non-abusive wors which are very close to the abusive ones. In this work, a multipurpose dataset is presented to recognize automatic slang speech for Bangla language, which was prepared by collection, annotation, and refinement of data. It consists of 114 slang words and 43 non-slang words with 6100 audio clips. For the collection of slang words, 60 native speakers and for non-abusive words, 23 native speakers participated who were, speaking in various dialects from over 20 districts of Bangladesh, and 10 university students participated to evaluate this dataset including annotation and refinements. Researchers can use this dataset to develop an automatic Bengali Slang speech recognition system, and also it can be used as a new benchmark for creating speech recognition-based machine learning models. This dataset can be enrich-ed further, and some background noise in the dataset can be used to simulate a more real-world scenario if desired. Otherwise, these noises could also be removed.
RESUMO
Speech Emotion Recognition (SER) identifies and categorizes emotional states by analyzing speech signals. SER is an emerging research area using machine learning and deep learning techniques due to its socio-cultural and business importance. An appropriate dataset is an important resource for SER related studies in a particular language. There is an apparent lack of SER datasets in Bangla language although it is one of the most spoken languages in the world. There are a few Bangla SER datasets but those consist of only a few dialogs with a minimal number of actors making them unsuitable for real-world applications. Moreover, the existing datasets do not consider the intensity level of emotions. The intensity of a specific emotional expression, such as anger or sadness, plays a crucial role in social behavior. Therefore, a realistic Bangla speech dataset is developed in this study which is called KUET Bangla Emotional Speech (KBES) dataset. The dataset consists of 900 audio signals (i.e., speech dialogs) from 35 actors (20 females and 15 males) with diverse age ranges. Source of the speech dialogs are Bangla Telefilm, Drama, TV Series, Web Series. There are five emotional categories: Neutral, Happy, Sad, Angry, and Disgust. Except Neutral, samples of a particular emotion are divided into two intensity levels: Low and High. The significant issue of the dataset is that the speech dialogs are almost unique with relatively large number of actors; whereas, existing datasets (such as SUBESCO and BanglaSER) contain samples with repeatedly spoken of a few pre-defined dialogs by a few actors/research volunteers in the laboratory environment. Finally, the KBES dataset is exposed as a nine-class problem to classify emotions into nine categories: Neutral, Happy (Low), Happy (High), Sad (Low), Sad (High), Angry (Low), Angry (High), Disgust (Low) and Disgust (High). However, the dataset is kept symmetrical containing 100 samples for each of the nine classes; 100 samples are also gender balanced with 50 samples for male/female actors. The developed dataset seems a realistic dataset while compared with the existing SER datasets.
RESUMO
Sign Language Recognition (SLR) is crucial for enabling communication between the deaf-mute and hearing communities. Nevertheless, the development of a comprehensive sign language dataset is a challenging task due to the complexity and variations in hand gestures. This challenge is particularly evident in the case of Bangla Sign Language (BdSL), where the limited availability of depth datasets impedes accurate recognition. To address this issue, we propose BdSL47, an open-access depth dataset for 47 one-handed static signs (10 digits, from ০ to ৯; and 37 letters, from ঠto à¤) of BdSL. The dataset was created using the MediaPipe framework for extracting depth information. To classify the signs, we developed an Artificial Neural Network (ANN) model with a 63-node input layer, a 47-node output layer, and 4 hidden layers that included dropout in the last two hidden layers, an Adam optimizer, and a ReLU activation function. Based on the selected hyperparameters, the proposed ANN model effectively learns the spatial relationships and patterns from the depth-based gestural input features and gives an F1 score of 97.84 %, indicating the effectiveness of the approach compared to the baselines provided. The availability of BdSL47 as a comprehensive dataset can have an impact on improving the accuracy of SLR for BdSL using more advanced deep-learning models.