Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Entropy (Basel) ; 20(1)2018 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-33265164

RESUMO

The paper is focused on an examination of the use of entropy in the field of web usage mining. Entropy creates an alternative possibility of determining the ratio of auxiliary pages in the session identification using the Reference Length method. The experiment was conducted on two different web portals. The first log file was obtained from a course of virtual learning environment web portal. The second log file was received from the web portal with anonymous access. A comparison of the results of entropy estimation of the ratio of auxiliary pages and a sitemap estimation of the ratio of auxiliary pages showed that in the case of sitemap abundance, entropy could be a full-valued substitution for the estimate of the ratio of auxiliary pages.

2.
PeerJ Comput Sci ; 10: e2026, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38855261

RESUMO

Morphological tagging provides essential insights into grammar, structure, and the mutual relationships of words within the sentence. Tagging text in a highly inflectional language presents a challenging task due to word ambiguity. This research aims to compare six different automatic taggers for the inflectional Slovak language, seeking for the most accurate tagger for literary and non-literary texts. Our results indicate that it is useful to differentiate texts into literary and non-literary and subsequently, based on the text style to deploy a tagger. For literary texts, UDPipe2 outperformed others in seven out of nine examined tagset positions. Conversely, for non-literary texts, the RNNTagger exhibited the highest performance in eight out of nine examined tagset positions. The RNNTagger is recommended for both types of the text, the best captures the inflection of the Slovak language, but UDPipe2 demonstrates a higher accuracy for literary texts. Despite dataset size limitations, this study emphasizes the suitability of various taggers for the inflectional languages like Slovak.

3.
Sci Rep ; 14(1): 9293, 2024 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-38654050

RESUMO

The aim of the study is to compare two different approaches to machine translation-statistical and neural-using automatic MT metrics of error rate and residuals. We examined four available online MT systems (statistical Google Translate, neural Google Translate, and two European commission's MT tools-statistical mt@ec and neural eTranslation) through their products (MT outputs). We propose using residual analysis to improve the accuracy of machine translation error rate. Residuals represent a new approach to comparing the quality of statistical and neural MT outputs. The study provides new insights into evaluating machine translation quality from English and German into Slovak through automatic error rate metrics. In the category of prediction and syntactic-semantic correlativeness, statistical MT showed a significantly higher error rate than neural MT. Conversely, in the category of lexical semantics, neural MT showed a significantly higher error rate than statistical MT. The results indicate that relying solely on the reference when determining MT quality is insufficient. However, when combined with residuals, it offers a more objective view of MT quality and facilitates the comparison of statistical MT and neural MT.

4.
Front Psychol ; 14: 1272370, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38259576

RESUMO

Introduction: Understanding how category width of cognitive style and power distance impact language use in cultures is crucial for improving cross-cultural communication. We attempt to reveal how English foreign language students, affected by high-context culture, communicate in English as a foreign language. What models of foreign communicative competence do they create? Methods: We applied association rule analysis to find out how the category width of cognitive style affects the foreign communication competence in relation to culture and language. Results: The requester tends to be more formal and transfers conventional norms of the culture of the mother tongue into English, which mainly affects the use of alerters and external modifications of the head act of request. Discussion: A broad categorizer, regardless of social distance, prefers to formulate the request in a conditional over the present tense form, contrary to narrow categorizers who, in a situation of social proximity, prefer the request form in the present tense. A similar finding was shown in the case of external modifications of the head act, where we observed the inversion between broad and narrow categorizers, mainly in the use of minimizers and mitigating devices.

5.
Sci Rep ; 13(1): 20123, 2023 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-37978270

RESUMO

Parallel texts represent a very valuable resource in many applications of natural language processing. The fundamental step in creating parallel corpus is the alignment. Sentence alignment is the issue of finding correspondence between source sentences and their equivalent translations in the target text. A number of automatic sentence alignment approaches were proposed including neural networks, which can be divided into length-based, lexicon-based, and translation-based. In our study, we used five different aligners, namely Bilingual sentence aligner (BSA), Hunalign, Bleualign, Vecalign, and Bertalign. We evaluated both, the performance of the Bertalign in terms of accuracy against the up to now employed aligners as well as among each other in the language pair English-Sovak. We created our custom corpus consisting of texts collected in 2021 and 2022. Vecalign and Bertalign performed statistically significantly best and BSA the worst. Hunalign and Bleualign achieved the same performance in terms of F1 score. However, Bleualign achieved the most diverse results in terms of performance.


Assuntos
Idioma , Processamento de Linguagem Natural , Eslováquia , Redes Neurais de Computação
6.
Procedia Comput Sci ; 207: 2618-2627, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36275392

RESUMO

Background: : Pandemic COVID-19 caused an infodemic - massive spread of true and fake information about novel coronavirus. This study aims to present the possibility of using Keyword Extraction as a tool to obtain the most trending search queries related to COVID-19 and analyze the possibility of including their search volume in models for the prediction of fake news. Methods: : The study used Python implementation of the machine learning-based technique KeyBERT to extract keywords from true and fake news. These keywords were used in the next step to obtain related search queries with Google Trends API. Results: : Non-parametric Spearman Rank Order Correlation was identified as a statistically positive correlation (p < 0.001) between the occurrence of false news and top query / rising query metrics provided by Google Trends of queries related to extracted keywords pandemic, HIV, lockdown, plague, Michigan, and protest, which proves that search volume can identify fake news. Conclusions: : Experiments done in this research proved that Keyword Extraction from false news is useful for obtaining related search queries and the top query and rising query metrics can be used to increase the accuracy of fake news prediction models.

7.
PeerJ Comput Sci ; 7: e706, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34712792

RESUMO

The rapid technologisation of translation has influenced the translation industry's direction towards machine translation, post-editing, subtitling services and video content translation. Besides, the pandemic situation associated with COVID-19 has rapidly increased the transfer of business and education to the virtual world. This situation has motivated us not only to look for new approaches to online translator training, which requires a different method than learning foreign languages but in particular to look for new approaches to assess translator performance within online educational environments. Translation quality assessment is a key task, as the concept of quality is closely linked to the concept of optimization. Automatic metrics are very good indicators of quality, but they do not provide sufficient and detailed linguistic information about translations or post-edited machine translations. However, using their residuals, we can identify the segments with the largest distances between the post-edited machine translations and machine translations, which allow us to focus on a more detailed textual analysis of suspicious segments. We introduce a unique online teaching and learning system, which is specifically "tailored" for online translators' training and subsequently we focus on a new approach to assess translators' competences using evaluation techniques-the metrics of automatic evaluation and their residuals. We show that the residuals of the metrics of accuracy (BLEU_n) and error rate (PER, WER, TER, CDER, and HTER) for machine translation post-editing are valid for translator assessment. Using the residuals of the metrics of accuracy and error rate, we can identify errors in post-editing (critical, major, and minor) and subsequently utilize them in more detailed linguistic analysis.

8.
PeerJ Comput Sci ; 7: e624, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34395862

RESUMO

Research of the techniques for effective fake news detection has become very needed and attractive. These techniques have a background in many research disciplines, including morphological analysis. Several researchers stated that simple content-related n-grams and POS tagging had been proven insufficient for fake news classification. However, they did not realise any empirical research results, which could confirm these statements experimentally in the last decade. Considering this contradiction, the main aim of the paper is to experimentally evaluate the potential of the common use of n-grams and POS tags for the correct classification of fake and true news. The dataset of published fake or real news about the current Covid-19 pandemic was pre-processed using morphological analysis. As a result, n-grams of POS tags were prepared and further analysed. Three techniques based on POS tags were proposed and applied to different groups of n-grams in the pre-processing phase of fake news detection. The n-gram size was examined as the first. Subsequently, the most suitable depth of the decision trees for sufficient generalization was scoped. Finally, the performance measures of models based on the proposed techniques were compared with the standardised reference TF-IDF technique. The performance measures of the model like accuracy, precision, recall and f1-score are considered, together with the 10-fold cross-validation technique. Simultaneously, the question, whether the TF-IDF technique can be improved using POS tags was researched in detail. The results showed that the newly proposed techniques are comparable with the traditional TF-IDF technique. At the same time, it can be stated that the morphological analysis can improve the baseline TF-IDF technique. As a result, the performance measures of the model, precision for fake news and recall for real news, were statistically significantly improved.

9.
Int J Neural Syst ; 31(10): 2150013, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33573532

RESUMO

Automated sentiment analysis is becoming increasingly recognized due to the growing importance of social media and e-commerce platform review websites. Deep neural networks outperform traditional lexicon-based and machine learning methods by effectively exploiting contextual word embeddings to generate dense document representation. However, this representation model is not fully adequate to capture topical semantics and the sentiment polarity of words. To overcome these problems, a novel sentiment analysis model is proposed that utilizes richer document representations of word-emotion associations and topic models, which is the main computational novelty of this study. The sentiment analysis model integrates word embeddings with lexicon-based sentiment and emotion indicators, including negations and emoticons, and to further improve its performance, a topic modeling component is utilized together with a bag-of-words model based on a supervised term weighting scheme. The effectiveness of the proposed model is evaluated using large datasets of Amazon product reviews and hotel reviews. Experimental results prove that the proposed document representation is valid for the sentiment analysis of product and hotel reviews, irrespective of their class imbalance. The results also show that the proposed model improves on existing machine learning methods.


Assuntos
Algoritmos , Redes Neurais de Computação , Emoções , Humanos , Aprendizado de Máquina , Semântica
10.
Data Brief ; 39: 107672, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34934786

RESUMO

The dataset presented in this article represents the pre-processed web server log file of the commercial bank. The source of data is the web server of the bank and keeps access of web users starting the year 2009 till 2012. It contains accesses to the bank website during and after the financial crisis. Unnecessary data saved by the web server was removed to keep the focus only on the textual content of the website. Many variables were added to the original log file to make the analysis workable. To keep the privacy of website users, sensitive information in the log file were anonymized. The dataset offers a way to understand the behavior of stakeholders during and after the crisis and how they comply with the Basel regulations. The behavior of users can be modeled using the multinomial logit model, which is in detail described in the research article [1] related to this data article.

11.
MethodsX ; 8: 101570, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35004204

RESUMO

The methods presented in this article were created to model and describe the behaviour of the web users of a bank institution web portal. The source dataset is represented by a log file of the commercial bank web server. The analysis is oriented on examining the behaviour of visitors over an extended period (2009-2012). The years 2009-2010 represent the years of the financial crisis, and the years 2011-2012 represent the years after the financial crisis. The following method describes the sequence of steps necessary to pre-process the raw log file and model the web user behaviour using the multinomial logit model. The introduced methods can be used also for other domains in the case of appropriate data preparation.•Data preparation- data cleaning, user/session identification, path completion, variables determination;•Data analysis- model definition, parameters estimation, logits estimation, probabilities estimation;•Results evaluation- comparison of empirical and theoretical values in term of counts, probabilities and logits.

12.
PLoS One ; 16(10): e0258449, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34705858

RESUMO

The paper examines the interest of the commercial banks' stakeholders in Pillar 3 disclosures and their behaviour during the timing of serious market turbulence. The aim is to discover to which extent current banking regulation supports stakeholders' interest in the information required by regulators to be disclosed. The examined data consists of log files that were pre-processed using web mining techniques and from which were extracted frequent item sets by quarters and evaluated in terms of quantity. The authors have proposed a methodology to evaluate frequent item sets of web parts over a dedicated time. Based on the verification of applied methodology on two commercial banks, the results show that stakeholders' interest in disclosures is highest in the first quarter at each year and after turbulent times in 2009 their interests decreased. Moreover, the results suggest that stakeholders expressed higher interest than in regulatory required Pillar 3 information in the following group of information: Pillar3 related information, Annual reports, Information on Group. Following our results, the paper contributes to cover the gap in the research by analysing Pillar 3 disclosures and their compliance with regulatory requirements, which also increase the interest of the relevant stakeholders to conduce them as an effective market discipline tool.


Assuntos
Revelação
13.
Crit Care Explor ; 2(11): e0279, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-33225305

RESUMO

OBJECTIVES: To propose the optimal timing to consider tracheostomy insertion for weaning of mechanically ventilated patients recovering from coronavirus disease 2019 pneumonia. We investigated the relationship between duration of mechanical ventilation prior to tracheostomy insertion and in-hospital mortality. In addition, we present a machine learning approach to facilitate decision-making. DESIGN: Prospective cohort study. SETTING: Guy's & St Thomas' Hospital, London, United Kingdom. PATIENTS: Consecutive patients admitted with acute respiratory failure secondary to coronavirus disease 2019 requiring mechanical ventilation between March 3, 2020, and May 5, 2020. INTERVENTIONS: Baseline characteristics and temporal trends in markers of disease severity were prospectively recorded. Tracheostomy was performed for anticipated prolonged ventilatory wean when levels of respiratory support were favorable. Decision tree was constructed using C4.5 algorithm, and its classification performance has been evaluated by a leave-one-out cross-validation technique. MEASUREMENTS AND MAIN RESULTS: One-hundred seventy-six patients required mechanical ventilation for acute respiratory failure, of which 87 patients (49.4%) underwent tracheostomy. We identified that optimal timing for tracheostomy insertion is between day 13 and day 17. Presence of fibrosis on CT scan (odds ratio, 13.26; 95% CI [3.61-48.91]; p ≤ 0.0001) and Pao2:Fio2 ratio (odds ratio, 0.98; 95% CI [0.95-0.99]; p = 0.008) were independently associated with tracheostomy insertion. Cox multiple regression analysis showed that chronic obstructive pulmonary disease (hazard ratio, 6.56; 95% CI [1.04-41.59]; p = 0.046), ischemic heart disease (hazard ratio, 4.62; 95% CI [1.19-17.87]; p = 0.027), positive end-expiratory pressure (hazard ratio, 1.26; 95% CI [1.02-1.57]; p = 0.034), Pao2:Fio2 ratio (hazard ratio, 0.98; 95% CI [0.97-0.99]; p = 0.003), and C-reactive protein (hazard ratio, 1.01; 95% CI [1-1.01]; p = 0.005) were independent late predictors of in-hospital mortality. CONCLUSIONS: We propose that the optimal window for consideration of tracheostomy for ventilatory weaning is between day 13 and 17. Late predictors of mortality may serve as adverse factors when considering tracheostomy, and our decision tree provides a degree of decision support for clinicians.

14.
Int J Environ Res Public Health ; 11(6): 5628-39, 2014 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-24865398

RESUMO

Many toxic substances in the workplace can modify human health and quality of life and there is still insufficient data on respiratory outcomes in adults exposed to phthalates. The aim of this work was to assess in waste management workers from the Nitra region of Slovakia (n = 30) the extent of exposure to phthalates and health-related outcomes. Four urinary phthalate metabolites mono(2-ethylhexyl) phthalate (MEHP), monobutyl phthalate (MnBP), monoethyl phthalate (MEP) and monoisononyl phthalate (MiNP) were determined by high-performance liquid chromatography with mass spectrometry (HPLC-MS/MS). Urinary concentration of MEHP was positively associated with ratio of forced expiratory volume in 1 s to forced vital capacity % (FEV1/FVC) (r = 0.431; p = 0.018) and MiNP with fat free mass index (FFMI) (r = 0.439; p = 0.015). The strongest predictor of pulmonary function was the pack/year index as smoking history that predicted a decrease of pulmonary parameters, the FEV1/FVC, % of predicted values of peak expiratory flow (PEF % of PV) and FEV1 % of PV. Unexpectedly, urinary MEHP and MINP were positively associated with pulmonary function expressed as PEF % of PV and FEV1/FVC. We hypothesize that occupational exposure to phthalates estimated from urinary metabolites (MEHP, MiNP) can modify pulmonary function on top of lifestyle factors.


Assuntos
Nível de Saúde , Pulmão/efeitos dos fármacos , Exposição Ocupacional/análise , Ácidos Ftálicos/efeitos adversos , Ácidos Ftálicos/urina , Gerenciamento de Resíduos , Adulto , Antropometria , Biomarcadores , Feminino , Humanos , Modelos Lineares , Pulmão/fisiologia , Masculino , Pessoa de Meia-Idade , Exposição Ocupacional/efeitos adversos , Testes de Função Respiratória , Eslováquia , Espirometria
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA