Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Front Aging Neurosci ; 16: 1341227, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39081395

RESUMO

Objective: Early identification of cognitive impairment in older adults could reduce the burden of age-related disabilities. Gait parameters are associated with and predictive of cognitive decline. Although a variety of sensors and machine learning analysis methods have been used in cognitive studies, a deep optimized machine vision-based method for analyzing gait to identify cognitive decline is needed. Methods: This study used a walking footage dataset of 158 adults named West China Hospital Elderly Gait, which was labelled by performance on the Short Portable Mental Status Questionnaire. We proposed a novel recognition network, Deep Optimized GaitPart (DO-GaitPart), based on silhouette and skeleton gait images. Three improvements were applied: short-term temporal template generator (STTG) in the template generation stage to decrease computational cost and minimize loss of temporal information; depth-wise spatial feature extractor (DSFE) to extract both global and local fine-grained spatial features from gait images; and multi-scale temporal aggregation (MTA), a temporal modeling method based on attention mechanism, to improve the distinguishability of gait patterns. Results: An ablation test showed that each component of DO-GaitPart was essential. DO-GaitPart excels in backpack walking scene on CASIA-B dataset, outperforming comparison methods, which were GaitSet, GaitPart, MT3D, 3D Local, TransGait, CSTL, GLN, GaitGL and SMPLGait on Gait3D dataset. The proposed machine vision gait feature identification method achieved a receiver operating characteristic/area under the curve (ROCAUC) of 0.876 (0.852-0.900) on the cognitive state classification task. Conclusion: The proposed method performed well identifying cognitive decline from the gait video datasets, making it a prospective prototype tool in cognitive assessment.

2.
Comput Biol Med ; 162: 107050, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37269680

RESUMO

Alzheimer's disease (AD) is a neurodegenerative disorder, the most common cause of dementia, so the accurate diagnosis of AD and its prodromal stage mild cognitive impairment (MCI) is significant. Recent studies have demonstrated that multiple neuroimaging and biological measures contain complementary information for diagnosis. Many existing multi-modal models based on deep learning simply concatenate each modality's features despite substantial differences in representation spaces. In this paper, we propose a novel multi-modal cross-attention AD diagnosis (MCAD) framework to learn the interaction between modalities for better playing their complementary roles for AD diagnosis with multi-modal data including structural magnetic resonance imaging (sMRI), fluorodeoxyglucose-positron emission tomography (FDG-PET) and cerebrospinal fluid (CSF) biomarkers. Specifically, the imaging and non-imaging representations are learned by the image encoder based on cascaded dilated convolutions and CSF encoder, respectively. Then, a multi-modal interaction module is introduced, which takes advantage of cross-modal attention to integrate imaging and non-imaging information and reinforce relationships between these modalities. Moreover, an extensive objective function is designed to reduce the discrepancy between modalities for effectively fusing the features of multi-modal data, which could further improve the diagnosis performance. We evaluate the effectiveness of our proposed method on the ADNI dataset, and the extensive experiments demonstrate that our MCAD achieves superior performance for multiple AD-related classification tasks, compared to several competing methods. Also, we investigate the importance of cross-attention and the contribution of each modality to the diagnostics performance. The experimental results demonstrate that combining multi-modality data via cross-attention is helpful for accurate AD diagnosis.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Humanos , Doença de Alzheimer/diagnóstico por imagem , Neuroimagem/métodos , Imageamento por Ressonância Magnética/métodos , Tomografia por Emissão de Pósitrons/métodos , Disfunção Cognitiva/diagnóstico por imagem
3.
Math Biosci Eng ; 20(3): 4912-4939, 2023 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-36896529

RESUMO

Chinese medical knowledge-based question answering (cMed-KBQA) is a vital component of the intelligence question-answering assignment. Its purpose is to enable the model to comprehend questions and then deduce the proper answer from the knowledge base. Previous methods solely considered how questions and knowledge base paths were represented, disregarding their significance. Due to entity and path sparsity, the performance of question and answer cannot be effectively enhanced. To address this challenge, this paper presents a structured methodology for the cMed-KBQA based on the cognitive science dual systems theory by synchronizing an observation stage (System 1) and an expressive reasoning stage (System 2). System 1 learns the question's representation and queries the associated simple path. Then System 2 retrieves complicated paths for the question from the knowledge base by using the simple path provided by System 1. Specifically, System 1 is implemented by the entity extraction module, entity linking module, simple path retrieval module, and simple path-matching model. Meanwhile, System 2 is performed by using the complex path retrieval module and complex path-matching model. The public CKBQA2019 and CKBQA2020 datasets were extensively studied to evaluate the suggested technique. Using the metric average F1-score, our model achieved 78.12% on CKBQA2019 and 86.60% on CKBQA2020.


Assuntos
Bases de Conhecimento , Semântica , Armazenamento e Recuperação da Informação , Resolução de Problemas
4.
Comput Methods Programs Biomed ; 228: 107249, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36423486

RESUMO

BACKGROUND AND OBJECTIVE: The Chinese medical question answer matching (cMedQAM) task is the essential branch of the medical question answering system. Its goal is to accurately choose the correct response from a pool of candidate answers. The relatively effective methods are deep neural network-based and attention-based to obtain rich question-and-answer representations. However, those methods overlook the crucial characteristics of Chinese characters: glyphs and pinyin. Furthermore, they lose the local semantic information of the phrase by generating attention information using only relevant medical keywords. To address this challenge, we propose the multi-scale context-aware interaction approach based on multi-granularity embedding (MAGE) in this paper. METHODS: We adapted ChineseBERT, which integrates Chinese characters glyphs and pinyin information into the language model and fine-tunes the medical corpus. It solves the common phenomenon of homonyms in Chinese. Moreover, we proposed a context-aware interactive module to correctly align question and answer sequences and infer semantic relationships. Finally, we utilized the multi-view fusion method to combine local semantic features and attention representation. RESULTS: We conducted validation experiments on the three publicly available datasets, namely cMedQA V1.0, cMedQA V2.0, and cEpilepsyQA. The proposed multi-scale context-aware interaction approach based on the multi-granularity embedding method is validated by top-1 accuracy. On cMedQA V1.0, cMedQA V2.0, and cEpilepsyQA, the top-1 accuracy on the test dataset was improved by 74.1%, 82.7%, and 60.9%, respectively. Experimental results on the three datasets demonstrate that our MAGE achieves superior performance over state-of-the-art methods for the Chinese medical question answer matching tasks. CONCLUSIONS: The experiment results indicate that the proposed model can improve the accuracy of the Chinese medical question answer matching task. Therefore, it may be considered a potential intelligent assistant tool for the future Chinese medical answer question system.


Assuntos
População do Leste Asiático , Idioma , Humanos
5.
Front Psychol ; 13: 1001885, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36438381

RESUMO

Background: Emotions play a decisive and central role in the workplace, especially in the service-oriented enterprises. Due to the highly participatory and interactive nature of the service process, employees' emotions are usually highly volatile during the service delivery process, which can have a negative impact on business performance. Therefore, it is important to effectively judge the emotional states of customer service staff. Methods: We collected data on real-life work situations of call center employees in a large company. Three consecutive studies were conducted: first, the emotional states of 29 customer service staff were videotaped by wide-angle cameras. In Study 1, we constructed scoring criteria and auxiliary tools of picture-type scales through a free association test. In Study 2, two groups of experts were invited to evaluate the emotional states of customer service staff. In Study 3, based on the results in Study 2 and a multimodal emotional recognition method, a multimodal dataset was constructed to explore how each modality conveys the emotions of customer service staff in workplace. Results: Through the scoring by 2 groups of experts and 1 group of volunteers, we first developed a set of scoring criteria and picture-type scales with the combination of SAM scale for judging the emotional state of customer service staff. Then we constructed 99 (out of 297) sets of stable multimodal emotion datasets. Based on the comparison among the datasets, we found that voice conveys emotional valence in the workplace more significantly, and that facial expressions have more prominant connection with emotional arousal. Conclusion: Theoretically, this study enriches the way in which emotion data is collected and can provide a basis for the subsequent development of multimodal emotional datasets. Practically, it can provide guidance for the effective judgment of employee emotions in the workplace.

6.
Heliyon ; 8(10): e11038, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-36267375

RESUMO

Visual-based social group detection aims to cluster pedestrians in crowd scenes according to social interactions and spatio-temporal position relations by using surveillance video data. It is a basic technique for crowd behaviour analysis and group-based activity understanding. According to the theory of proxemics study, the interpersonal relationship between individuals determines the scope of their self-space, while the spatial distance can reflect the closeness degree of their interpersonal relationship. In this paper, we proposed a new unsupervised approach to address the issues of interaction recognition and social group detection in public spaces, which remits the need to intensely label time-consuming training data. First, based on pedestrians' spatio-temporal trajectories, the interpersonal distances among individuals were measured from static and dynamic perspectives. Combined with proxemics' theory, a social interaction recognition scheme was designed to judge whether there is a social interaction between pedestrians. On this basis, the pedestrians are clustered to identify if they form a social group. Extensive experiments on our pedestrian dataset "SCU-VSD-Social" annotated with multi-group labels demonstrated that the proposed method has outstanding performance in both accuracy and complexity.

7.
Artif Intell Med ; 131: 102346, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-36100340

RESUMO

Medical visual question answering (Med-VQA) aims to accurately answer clinical questions about medical images. Despite its enormous potential for application in the medical domain, the current technology is still in its infancy. Compared with general visual question answering task, Med-VQA task involve more demanding challenges. First, clinical questions about medical images are usually diverse due to different clinicians and the complexity of diseases. Consequently, noise is inevitably introduced when extracting question features. Second, Med-VQA task have always been regarded as a classification problem for predefined answers, ignoring the relationships between candidate responses. Thus, the Med-VQA model pays equal attention to all candidate answers when predicting answers. In this paper, a novel Med-VQA framework is proposed to alleviate the above-mentioned problems. Specifically, we employed a question-type reasoning module severally to closed-ended and open-ended questions, thereby extracting the important information contained in the questions through an attention mechanism and filtering the noise to extract more valuable question features. To take advantage of the relational information between answers, we designed a semantic constraint space to calculate the similarity between the answers and assign higher attention to answers with high correlation. To evaluate the effectiveness of the proposed method, extensive experiments were conducted on a public dataset, namely VQA-RAD. Experimental results showed that the proposed method achieved better performance compared to other the state-of-the-art methods. The overall accuracy, closed-ended accuracy, and open-ended accuracy reached 74.1 %, 82.7 %, and 60.9 %, respectively. It is worth noting that the absolute accuracy of the proposed method improved by 5.5 % for closed-ended questions.


Assuntos
Semântica , Algoritmos , Atenção , Interpretação de Imagem Assistida por Computador/métodos , Processamento de Imagem Assistida por Computador/métodos
8.
Neural Netw ; 155: 155-167, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36058021

RESUMO

Text-to-image synthesis is a fundamental and challenging task in computer vision, which aims to synthesize realistic images from given descriptions. Recently, text-to-image synthesis methods have achieved great improvements in the quality of synthesized images. However, very few works have explored its application in the scenario of face synthesis, which is of great potentials in face-related applications and the public safety domain. On the other side, the faces generated by existing methods are generally of poor quality and have low consistency to the given text. To tackle this issue, in this paper, we build a novel end-to-end dual-channel generator based generative adversarial network, named DualG-GAN, to improve the quality of the generated images and the consistency to the text description. In DualG-GAN, to improve the consistency between the synthesized image and the input description, a dual-channel generator block is introduced, and a novel loss is designed to improve the similarity between the generated image and the ground-truth in three different semantic levels. Extensive experiments demonstrate that DualG-GAN achieves state-of-the-art results on SCU-Text2face dataset. To further verify the performance of DualG-GAN, we compare it with the current optimal methods on text-to-image synthesis tasks, where quantitative and qualitative results show that the proposed DualG-GAN achieves optimal performance in both Fréchet inception distance (FID) and R-precision metrics. As only a few works are focusing on text-to-face synthesis, this work can be seen as a baseline for future research.


Assuntos
Processamento de Imagem Assistida por Computador , Processamento de Imagem Assistida por Computador/métodos
9.
Math Biosci Eng ; 19(10): 10192-10212, 2022 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-36031991

RESUMO

Medical visual question answering (Med-VQA) aims to leverage a pre-trained artificial intelligence model to answer clinical questions raised by doctors or patients regarding radiology images. However, owing to the high professional requirements in the medical field and the difficulty of annotating medical data, Med-VQA lacks sufficient large-scale, well-annotated radiology images for training. Researchers have mainly focused on improving the ability of the model's visual feature extractor to address this problem. However, there are few researches focused on the textual feature extraction, and most of them underestimated the interactions between corresponding visual and textual features. In this study, we propose a corresponding feature fusion (CFF) method to strengthen the interactions of specific features from corresponding radiology images and questions. In addition, we designed a semantic attention (SA) module for textual feature extraction. This helps the model consciously focus on the meaningful words in various questions while reducing the attention spent on insignificant information. Extensive experiments demonstrate that the proposed method can achieve competitive results in two benchmark datasets and outperform existing state-of-the-art methods on answer prediction accuracy. Experimental results also prove that our model is capable of semantic understanding during answer prediction, which has certain advantages in Med-VQA.


Assuntos
Inteligência Artificial , Semântica , Algoritmos , Atenção , Humanos
10.
Sensors (Basel) ; 22(15)2022 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-35957306

RESUMO

Social relationships refer to the connections that exist between people and indicate how people interact in society. The effective recognition of social relationships is conducive to further understanding human behavioral patterns and thus can be vital for more complex social intelligent systems, such as interactive robots and health self-management systems. The existing works about social relation recognition (SRR) focus on extracting features on different scales but lack a comprehensive mechanism to orchestrate various features which show different degrees of importance. In this paper, we propose a new SRR framework, namely Multi-level Transformer-Based Social Relation Recognition (MT-SRR), for better orchestrating features on different scales. Specifically, a vision transformer (ViT) is firstly employed as a feature extraction module for its advantage in exploiting global features. An intra-relation transformer (Intra-TRM) is then introduced to dynamically fuse the extracted features to generate more rational social relation representations. Next, an inter-relation transformer (Inter-TRM) is adopted to further enhance the social relation representations by attentionally utilizing the logical constraints among relationships. In addition, a new margin related to inter-class similarity and a sample number are added to alleviate the challenges of a data imbalance. Extensive experiments demonstrate that MT-SRR can better fuse features on different scales as well as ameliorate the bad effect caused by a data imbalance. The results on the benchmark datasets show that our proposed model outperforms the state-of-the-art methods with significant improvement.


Assuntos
Benchmarking , Reconhecimento Psicológico , Humanos
11.
J Neural Eng ; 19(4)2022 08 09.
Artigo em Inglês | MEDLINE | ID: mdl-35882218

RESUMO

Objective. Alzheimer's disease (AD) is a degenerative brain disorder, one of the main causes of death in elderly people, so early diagnosis of AD is vital to prompt access to medication and medical care. Fluorodeoxyglucose positron emission tomography (FDG-PET) proves to be effective to help understand neurological changes via measuring glucose uptake. Our aim is to explore information-rich regions of FDG-PET imaging, which enhance the accuracy and interpretability of AD-related diagnosis.Approach. We develop a novel method for early diagnosis of AD based on multi-scale discriminative regions in FDG-PET imaging, which considers the diagnosis interpretability. Specifically, a multi-scale region localization module is discussed to automatically identify disease-related discriminative regions in full-volume FDG-PET images in an unsupervised manner, upon which a confidence score is designed to evaluate the prioritization of regions according to the density distribution of anomalies. Then, the proposed multi-scale region classification module adaptively fuses multi-scale region representations and makes decision fusion, which not only reduces useless information but also offers complementary information. Most of previous methods concentrate on discriminating AD from cognitively normal (CN), while mild cognitive impairment, a transitional state, facilitates early diagnosis. Therefore, our method is further applied to multiple AD-related diagnosis tasks, not limited to AD vs. CN.Main results. Experimental results on the Alzheimer's Disease Neuroimaging Initiative dataset show that the proposed method achieves superior performance over state-of-the-art FDG-PET-based approaches. Besides, some cerebral cortices highlighted by extracted regions cohere with medical research, further demonstrating the superiority.Significance. This work offers an effective method to achieve AD diagnosis and detect disease-affected regions in FDG-PET imaging. Our results could be beneficial for providing an additional opinion on the clinical diagnosis.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Idoso , Doença de Alzheimer/diagnóstico por imagem , Encéfalo , Disfunção Cognitiva/diagnóstico por imagem , Diagnóstico Precoce , Fluordesoxiglucose F18 , Humanos , Tomografia por Emissão de Pósitrons/métodos
12.
Comput Methods Programs Biomed ; 217: 106676, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35167997

RESUMO

BACKGROUND AND OBJECTIVE: Multi-modal medical images, such as magnetic resonance imaging (MRI) and positron emission tomography (PET), have been widely used for the diagnosis of brain disorder diseases like Alzheimer's disease (AD) since they can provide various information. PET scans can detect cellular changes in organs and tissues earlier than MRI. Unlike MRI, PET data is difficult to acquire due to cost, radiation, or other limitations. Moreover, PET data is missing for many subjects in the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. To solve this problem, a 3D end-to-end generative adversarial network (named BPGAN) is proposed to synthesize brain PET from MRI scans, which can be used as a potential data completion scheme for multi-modal medical image research. METHODS: We propose BPGAN, which learns an end-to-end mapping function to transform the input MRI scans to their underlying PET scans. First, we design a 3D multiple convolution U-Net (MCU) generator architecture to improve the visual quality of synthetic results while preserving the diverse brain structures of different subjects. By further employing a 3D gradient profile (GP) loss and structural similarity index measure (SSIM) loss, the synthetic PET scans have higher-similarity to the ground truth. In this study, we explore alternative data partitioning ways to study their impact on the performance of the proposed method in different medical scenarios. RESULTS: We conduct experiments on a publicly available ADNI database. The proposed BPGAN is evaluated by mean absolute error (MAE), peak-signal-to-noise-ratio (PSNR) and SSIM, superior to other compared models in these quantitative evaluation metrics. Qualitative evaluations also validate the effectiveness of our approach. Additionally, combined with MRI and our synthetic PET scans, the accuracies of multi-class AD diagnosis on dataset-A and dataset-B are 85.00% and 56.47%, which have been improved by about 1% and 1%, respectively, compared to the stand-alone MRI. CONCLUSIONS: The experimental results of quantitative measures, qualitative displays, and classification evaluation demonstrate that the synthetic PET images by BPGAN are reasonable and high-quality, which provide complementary information to improve the performance of AD diagnosis. This work provides a valuable reference for multi-modal medical image analysis.


Assuntos
Doença de Alzheimer , Doença de Alzheimer/diagnóstico por imagem , Encéfalo/diagnóstico por imagem , Humanos , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Tomografia por Emissão de Pósitrons
13.
IEEE Trans Neural Netw Learn Syst ; 33(1): 430-444, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34793307

RESUMO

The amount of multimedia data, such as images and videos, has been increasing rapidly with the development of various imaging devices and the Internet, bringing more stress and challenges to information storage and transmission. The redundancy in images can be reduced to decrease data size via lossy compression, such as the most widely used standard Joint Photographic Experts Group (JPEG). However, the decompressed images generally suffer from various artifacts (e.g., blocking, banding, ringing, and blurring) due to the loss of information, especially at high compression ratios. This article presents a feature-enriched deep convolutional neural network for compression artifacts reduction (FeCarNet, for short). Taking the dense network as the backbone, FeCarNet enriches features to gain valuable information via introducing multi-scale dilated convolutions, along with the efficient 1 ×1 convolution for lowering both parameter complexity and computation cost. Meanwhile, to make full use of different levels of features in FeCarNet, a fusion block that consists of attention-based channel recalibration and dimension reduction is developed for local and global feature fusion. Furthermore, short and long residual connections both in the feature and pixel domains are combined to build a multi-level residual structure, thereby benefiting the network training and performance. In addition, aiming at reducing computation complexity further, pixel-shuffle-based image downsampling and upsampling layers are, respectively, arranged at the head and tail of the FeCarNet, which also enlarges the receptive field of the whole network. Experimental results show the superiority of FeCarNet over state-of-the-art compression artifacts reduction approaches in terms of both restoration capacity and model complexity. The applications of FeCarNet on several computer vision tasks, including image deblurring, edge detection, image segmentation, and object detection, demonstrate the effectiveness of FeCarNet further.

14.
Front Psychol ; 13: 1078691, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36733871

RESUMO

Emotion measurement is crucial to conducting emotion research. Numerous studies have extensively employed textual scales for psychological and organizational behavior research. However, emotions are transient states of organisms with relatively short duration, some insurmountable limitations of textual scales have been reported, including low reliability for single measurement or susceptibility to learning effects for multiple repeated use. In the present article, we introduce the Highly Dynamic and Reusable Picture-based Scale (HDRPS), which was randomly generated based on 3,386 realistic, high-quality photographs that are divided into five categories (people, animals, plants, objects, and scenes). Affective ratings of the photographs were gathered from 14 experts and 209 professional judges. The HDRPS was validated using the Self-Assessment Manikin and the PANAS by 751 participants. With an accuracy of 89.73%, this new tool allows researchers to measure individual emotions continuously for their research. The non-commercial use of the HDRPS system can be freely accessible by request at http://syy.imagesoft.cc:8989/Pictures.7z. HDRPS is used for non-commercial academic research only. As some of the images are collected through the open network, it is difficult to trace the source, so please contact the author if there are any copyright issues.

15.
Front Aging Neurosci ; 13: 757823, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34867286

RESUMO

Background: Frail older adults have an increased risk of adverse health outcomes and premature death. They also exhibit altered gait characteristics in comparison with healthy individuals. Methods: In this study, we created a Fried's frailty phenotype (FFP) labelled casual walking video set of older adults based on the West China Health and Aging Trend study. A series of hyperparameters in machine vision models were evaluated for body key point extraction (AlphaPose), silhouette segmentation (Pose2Seg, DPose2Seg, and Mask R-CNN), gait feature extraction (Gaitset, LGaitset, and DGaitset), and feature classification (AlexNet and VGG16), and were highly optimised during analysis of gait sequences of the current dataset. Results: The area under the curve (AUC) of the receiver operating characteristic (ROC) at the physical frailty state identification task for AlexNet was 0.851 (0.827-0.8747) and 0.901 (0.878-0.920) in macro and micro, respectively, and was 0.855 (0.834-0.877) and 0.905 (0.886-0.925) for VGG16 in macro and micro, respectively. Furthermore, this study presents the machine vision method equipped with better predictive performance globally than age and grip strength, as well as than 4-m-walking-time in healthy and pre-frailty classifying. Conclusion: The gait analysis method in this article is unreported and provides promising original tool for frailty and pre-frailty screening with the characteristics of convenience, objectivity, rapidity, and non-contact. These methods can be extended to any gait-related disease identification processes, as well as in-home health monitoring.

16.
Sustain Cities Soc ; 68: 102765, 2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-33585169

RESUMO

Social distancing in public spaces plays a crucial role in controlling or slowing down the spread of coronavirus during the COVID-19 pandemic. Visual Social Distancing (VSD) offers an opportunity for real-time measuring and analysing the physical distance between pedestrians using surveillance videos in public spaces. It potentially provides new evidence for implementing effective prevention measures of the pandemic. The existing VSD methods developed in the literature are primarily based on frame-by-frame pedestrian detection, addressing the VSD problem from a static and local perspective. In this paper, we propose a new online multi-pedestrian tracking approach for spatio-temporal trajectory and its application to multi-scale social distancing measuring and analysis. Firstly, an online multi-pedestrian tracking method is proposed to obtain the trajectories of pedestrians in public spaces, based on hierarchical data association. Then, a new VSD method based on spatio-temporal trajectories is proposed. The proposed method not only considers the Euclidean distance between tracking objects frame-by-frame but also takes into account the discrete Fréchet distance between trajectories, hence forms a comprehensive solution from both static and dynamic, local and holistic perspectives. We evaluated the performance of the proposed tracking method using the public dataset MOT16 benchmark. We also collected our own pedestrian dataset "SCU-VSD" and designed a multi-scale VSD analysis scheme for benchmarking the performance of the social distancing monitoring in the crowd. Experiments have demonstrated that the proposed method achieved outstanding performance on the analysis of social distancing.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA