Eliminating Data Duplication in CQA Platforms Using Deep Neural Model.

Rani, Seema; Kumar, Avadhesh; Kumar, Naresh

Rani, Seema; Kumar, Avadhesh; Kumar, Naresh.

Afiliación

Rani S; School of Computing Science & Engineering Galgotias University, Greater Noida, Uttar Pradesh, India.
Kumar A; School of Computing Science & Engineering Galgotias University, Greater Noida, Uttar Pradesh, India.
Kumar N; School of Computing Science & Engineering Galgotias University, Greater Noida, Uttar Pradesh, India.

Comput Intell Neurosci ; 2022: 2067449, 2022.

Article en En | MEDLINE | ID: mdl-36059414

ABSTRACT

ABSTRACT

Primary research to detect duplicate question pairs within community-based question answering systems is based on datasets made of English questions only. This research put forward a solution to the problem of duplicate question detection by matching semantically identical questions in transliterated bilingual data. Deep learning has been implemented to analyze informal languages like Hinglish which is a bilingual mix of Hindi and English on Community Question Answering (CQA) platforms to identify duplicacy in questions. The proposed model works in two sequential modules. First module is a language transliteration module which converts input questions into a mono-language text. The next module takes the transliterated text where a hybrid deep learning model which is implemented using multiple layers is used to detect duplicate questions in the mono-lingual data. The similarity between the question pairs is done utilizing this hybrid model combining a Siamese neural network with identical capsule network as the subnetworks and a decision tree classifier. Manhattan distance function is used with the Siamese network for computing the similarity between questions. The proposed model has been validated on 150 pairs of questions which were scrapped from various social media platforms, such as Tripadvisor and Quora which achieves accuracy of 87.0885% and AUC-ROC value of 0.86.

Asunto(s)

Redes Neurales de la Computación; Medios de Comunicación Sociales; Humanos; Lenguaje

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Redes Neurales de la Computación / Medios de Comunicación Sociales Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: Comput Intell Neurosci Asunto de la revista: INFORMATICA MEDICA / NEUROLOGIA Año: 2022 Tipo del documento: Article País de afiliación: India

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google