Eliminating Data Duplication in CQA Platforms Using Deep Neural Model.
Comput Intell Neurosci
; 2022: 2067449, 2022.
Article
en En
| MEDLINE
| ID: mdl-36059414
ABSTRACT
Primary research to detect duplicate question pairs within community-based question answering systems is based on datasets made of English questions only. This research put forward a solution to the problem of duplicate question detection by matching semantically identical questions in transliterated bilingual data. Deep learning has been implemented to analyze informal languages like Hinglish which is a bilingual mix of Hindi and English on Community Question Answering (CQA) platforms to identify duplicacy in questions. The proposed model works in two sequential modules. First module is a language transliteration module which converts input questions into a mono-language text. The next module takes the transliterated text where a hybrid deep learning model which is implemented using multiple layers is used to detect duplicate questions in the mono-lingual data. The similarity between the question pairs is done utilizing this hybrid model combining a Siamese neural network with identical capsule network as the subnetworks and a decision tree classifier. Manhattan distance function is used with the Siamese network for computing the similarity between questions. The proposed model has been validated on 150 pairs of questions which were scrapped from various social media platforms, such as Tripadvisor and Quora which achieves accuracy of 87.0885% and AUC-ROC value of 0.86.
Texto completo:
1
Colección:
01-internacional
Banco de datos:
MEDLINE
Asunto principal:
Redes Neurales de la Computación
/
Medios de Comunicación Sociales
Tipo de estudio:
Prognostic_studies
Límite:
Humans
Idioma:
En
Revista:
Comput Intell Neurosci
Asunto de la revista:
INFORMATICA MEDICA
/
NEUROLOGIA
Año:
2022
Tipo del documento:
Article
País de afiliación:
India