Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38843429

RESUMO

Objective: This study aims to investigate the prevalence of post-traumatic stress disorder (PTSD) in patients with acute myocardial infarction (AMI) after percutaneous coronary intervention (PCI). Additionally, the study will analyze the correlation between self-efficacy and PTSD in patients with acute myocardial infarction who have undergone PCI. Methods: This study focused on 268 AMI patients admitted to our hospital between April 2019 and March 2022. We utilized the Posttraumatic Stress Disorder Scale-Civilian Version (PCL-C) to conduct a questionnaire survey and analyzed the correlation between self-efficacy, postoperative fatigue, and PTSD using Pearson. Additionally, we established a structural equation model (SEM) using Amos 21.0 software and conducted a mediation effect test. Results: (1) The PTSD score of 268 AMI patients in this study after PCI was (36.62 ± 4.62), the fatigue score was (8.62 ± 0.82), and the fatigue score was (8.62 ± 0.82). 0.82), and the self-efficacy score was (19.34 ± 2.24); (2) Gender, educational level, and complications were the influencing factors of PTSD in AMI patients (P < .05); (3) Pearson analysis showed that PTSD after PCI in AMI patients was correlated positively with fatigue and had a negative correlation with self-efficacy; fatigue It was negatively correlated with self-efficacy (both P < .01); (4) The mediating effect of self-efficacy between fatigue and PTSD in AMI patients after PCI was established, and the mediating effect value was 29.31%. Conclusion: PTSD, fatigue, and self-efficacy after PCI in AMI patients are all at moderate levels, which need clinical attention-29.31% mediating effect between fatigue and PTSD, confirming that fatigue can affect PTSD by regulating self-efficacy.

2.
IEEE Trans Fuzzy Syst ; 29(1): 34-45, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-33408453

RESUMO

Traditional deep learning methods are sub-optimal in classifying ambiguity features, which often arise in noisy and hard to predict categories, especially, to distinguish semantic scoring. Semantic scoring, depending on semantic logic to implement evaluation, inevitably contains fuzzy description and misses some concepts, for example, the ambiguous relationship between normal and probably normal always presents unclear boundaries (normal - more likely normal - probably normal). Thus, human error is common when annotating images. Differing from existing methods that focus on modifying kernel structure of neural networks, this study proposes a dominant fuzzy fully connected layer (FFCL) for Breast Imaging Reporting and Data System (BI-RADS) scoring and validates the universality of this proposed structure. This proposed model aims to develop complementary properties of scoring for semantic paradigms, while constructing fuzzy rules based on analyzing human thought patterns, and to particularly reduce the influence of semantic conglutination. Specifically, this semantic-sensitive defuzzier layer projects features occupied by relative categories into semantic space, and a fuzzy decoder modifies probabilities of the last output layer referring to the global trend. Moreover, the ambiguous semantic space between two relative categories shrinks during the learning phases, as the positive and negative growth trends of one category appearing among its relatives were considered. We first used the Euclidean Distance (ED) to zoom in the distance between the real scores and the predicted scores, and then employed two sample t test method to evidence the advantage of the FFCL architecture. Extensive experimental results performed on the CBIS-DDSM dataset show that our FFCL structure can achieve superior performances for both triple and multiclass classification in BI-RADS scoring, outperforming the state-of-the-art methods.

3.
IEEE Trans Pattern Anal Mach Intell ; 46(8): 5504-5523, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38354073

RESUMO

Point clouds have garnered increasing research attention and found numerous practical applications. However, many of these applications, such as autonomous driving and robotic manipulation, rely on sequential point clouds, essentially adding a temporal dimension to the data (i.e., four dimensions) because the information of the static point cloud data could provide is still limited. Recent research efforts have been directed towards enhancing the understanding and utilization of sequential point clouds. This paper offers a comprehensive review of deep learning methods applied to sequential point cloud research, encompassing dynamic flow estimation, object detection & tracking, point cloud segmentation, and point cloud forecasting. This paper further summarizes and compares the quantitative results of the reviewed methods over the public benchmark datasets. Ultimately, the paper concludes by addressing the challenges in current sequential point cloud research and pointing towards promising avenues for future research.

4.
Pattern Recognit Lett ; 34(10): 1130-1137, 2013 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-23710105

RESUMO

In this paper, we propose a texture representation framework to map local texture patches into a low-dimensional texture subspace. In natural texture images, textons are entangled with multiple factors, such as rotation, scaling, viewpoint variation, illumination change, and non-rigid surface deformation. Mapping local texture patches into a low-dimensional subspace can alleviate or eliminate these undesired variation factors resulting from both geometric and photometric transformations. We observe that texture representations based on subspace embeddings have strong resistance to image deformations, meanwhile, are more distinctive and more compact than traditional representations. We investigate both linear and non-linear embedding methods including Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Locality Preserving Projections (LPP) to compute the essential texture subspace. The experiments in the context of texture classification on benchmark datasets demonstrate that the proposed subspace embedding representations achieve the state-of-the-art results while with much fewer feature dimensions.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 4302-4320, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35877805

RESUMO

Understanding human behavior and activity facilitates advancement of numerous real-world applications, and is critical for video analysis. Despite the progress of action recognition algorithms in trimmed videos, the majority of real-world videos are lengthy and untrimmed with sparse segments of interest. The task of temporal activity detection in untrimmed videos aims to localize the temporal boundary of actions and classify the action categories. Temporal activity detection task has been investigated in full and limited supervision settings depending on the availability of action annotations. This article provides an extensive overview of deep learning-based algorithms to tackle temporal action detection in untrimmed videos with different supervision levels including fully-supervised, weakly-supervised, unsupervised, self-supervised, and semi-supervised. In addition, this article reviews advances in spatio-temporal action detection where actions are localized in both temporal and spatial dimensions. Action detection in online setting is also reviewed where the goal is to detect actions in each frame without considering any future context in a live video stream. Moreover, the commonly used action detection benchmark datasets and evaluation metrics are described, and the performance of the state-of-the-art methods are compared. Finally, real-world applications of temporal action detection in untrimmed videos and a set of future directions are discussed.

6.
Bioengineering (Basel) ; 10(1)2023 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-36671688

RESUMO

Early intervention in kidney cancer helps to improve survival rates. Abdominal computed tomography (CT) is often used to diagnose renal masses. In clinical practice, the manual segmentation and quantification of organs and tumors are expensive and time-consuming. Artificial intelligence (AI) has shown a significant advantage in assisting cancer diagnosis. To reduce the workload of manual segmentation and avoid unnecessary biopsies or surgeries, in this paper, we propose a novel end-to-end AI-driven automatic kidney and renal mass diagnosis framework to identify the abnormal areas of the kidney and diagnose the histological subtypes of renal cell carcinoma (RCC). The proposed framework first segments the kidney and renal mass regions by a 3D deep learning architecture (Res-UNet), followed by a dual-path classification network utilizing local and global features for the subtype prediction of the most common RCCs: clear cell, chromophobe, oncocytoma, papillary, and other RCC subtypes. To improve the robustness of the proposed framework on the dataset collected from various institutions, a weakly supervised learning schema is proposed to leverage the domain gap between various vendors via very few CT slice annotations. Our proposed diagnosis system can accurately segment the kidney and renal mass regions and predict tumor subtypes, outperforming existing methods on the KiTs19 dataset. Furthermore, cross-dataset validation results demonstrate the robustness of datasets collected from different institutions trained via the weakly supervised learning schema.

7.
IEEE Trans Syst Man Cybern C Appl Rev ; 42(6): 1021-1030, 2012 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22661884

RESUMO

We develop a novel camera-based computer vision technology to automatically recognize banknotes for assisting visually impaired people. Our banknote recognition system is robust and effective with the following features: 1) high accuracy: high true recognition rate and low false recognition rate, 2) robustness: handles a variety of currency designs and bills in various conditions, 3) high efficiency: recognizes banknotes quickly, and 4) ease of use: helps blind users to aim the target for image capture. To make the system robust to a variety of conditions including occlusion, rotation, scaling, cluttered background, illumination change, viewpoint variation, and worn or wrinkled bills, we propose a component-based framework by using Speeded Up Robust Features (SURF). Furthermore, we employ the spatial relationship of matched SURF features to detect if there is a bill in the camera view. This process largely alleviates false recognition and can guide the user to correctly aim at the bill to be recognized. The robustness and generalizability of the proposed system is evaluated on a dataset including both positive images (with U.S. banknotes) and negative images (no U.S. banknotes) collected under a variety of conditions. The proposed algorithm, achieves 100% true recognition rate and 0% false recognition rate. Our banknote recognition system is also tested by blind users.

8.
Front Radiol ; 2: 1041518, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37492669

RESUMO

Medical imaging data annotation is expensive and time-consuming. Supervised deep learning approaches may encounter overfitting if trained with limited medical data, and further affect the robustness of computer-aided diagnosis (CAD) on CT scans collected by various scanner vendors. Additionally, the high false-positive rate in automatic lung nodule detection methods prevents their applications in daily clinical routine diagnosis. To tackle these issues, we first introduce a novel self-learning schema to train a pre-trained model by learning rich feature representatives from large-scale unlabeled data without extra annotation, which guarantees a consistent detection performance over novel datasets. Then, a 3D feature pyramid network (3DFPN) is proposed for high-sensitivity nodule detection by extracting multi-scale features, where the weights of the backbone network are initialized by the pre-trained model and then fine-tuned in a supervised manner. Further, a High Sensitivity and Specificity (HS2) network is proposed to reduce false positives by tracking the appearance changes among continuous CT slices on Location History Images (LHI) for the detected nodule candidates. The proposed method's performance and robustness are evaluated on several publicly available datasets, including LUNA16, SPIE-AAPM, LungTIME, and HMS. Our proposed detector achieves the state-of-the-art result of 90.6% sensitivity at 1/8 false positive per scan on the LUNA16 dataset. The proposed framework's generalizability has been evaluated on three additional datasets (i.e., SPIE-AAPM, LungTIME, and HMS) captured by different types of CT scanners.

9.
IEEE Trans Pattern Anal Mach Intell ; 44(3): 1638-1652, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-32822292

RESUMO

Text instance as one category of self-described objects provides valuable information for understanding and describing cluttered scenes. The rich and precise high-level semantics embodied in the text could drastically benefit the understanding of the world around us. While most recent visual phrase grounding approaches focus on general objects, this paper explores extracting designated texts and predicting unambiguous scene text information, i.e., to accurately localize and recognize a specific targeted text instance in a cluttered image from natural language descriptions (referring expressions). To address this issue, first a novel recurrent dense text localization network (DTLN) is proposed to sequentially decode the intermediate convolutional representations of a cluttered scene image into a set of distinct text instance detections. Our approach avoids repeated text detections at multiple scales by recurrently memorizing previous detections, and effectively tackles crowded text instances in close proximity. Second, we propose a context reasoning text retrieval (CRTR) model, which jointly encodes text instances and their context information through a recurrent network, and ranks localized text bounding boxes by a scoring function of context compatibility. Third, a recurrent text recognition module is introduced to extend the applicability of aforementioned DTLN and CRTR models, via text verification or transcription. Quantitative evaluations on standard scene text extraction benchmarks and a newly collected scene text retrieval dataset demonstrate the effectiveness and advantages of our models for the joint scene text localization, retrieval, and recognition task.


Assuntos
Algoritmos , Semântica
10.
Comput Med Imaging Graph ; 100: 102094, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35914340

RESUMO

Contrast agents are commonly used to highlight blood vessels, organs, and other structures in magnetic resonance imaging (MRI) and computed tomography (CT) scans. However, these agents may cause allergic reactions or nephrotoxicity, limiting their use in patients with kidney dysfunctions. In this paper, we propose a generative adversarial network (GAN) based framework to automatically synthesize contrast-enhanced CTs directly from the non-contrast CTs in the abdomen and pelvis region. The respiratory and peristaltic motion can affect the pixel-level mapping of contrast-enhanced learning, which makes this task more challenging than other body parts. A perceptual loss is introduced to compare high-level semantic differences of the enhancement areas between the virtual contrast-enhanced and actual contrast-enhanced CT images. Furthermore, to accurately synthesize the intensity details as well as remain texture structures of CT images, a dual-path training schema is proposed to learn the texture and structure features simultaneously. Experiment results on three contrast phases (i.e. arterial, portal, and delayed phase) show the potential to synthesize virtual contrast-enhanced CTs directly from non-contrast CTs of the abdomen and pelvis for clinical evaluation.


Assuntos
Abdome , Tomografia Computadorizada por Raios X , Abdome/diagnóstico por imagem , Humanos , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética , Pelve/diagnóstico por imagem , Pelve/patologia , Tomografia Computadorizada por Raios X/métodos
11.
Technol Disabil ; 23(2): 75-85, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22523465

RESUMO

Matching clothes is a challenging task for many blind people. In this paper, we present a proof of concept system to solve this problem. The system consists of 1) a camera connected to a computer to perform pattern and color matching process; 2) speech commands for system control and configuration; and 3) audio feedback to provide matching results for both color and patterns of clothes. This system can handle clothes in deficient color without any pattern, as well as clothing with multiple colors and complex patterns to aid both blind and color deficient people. Furthermore, our method is robust to variations of illumination, clothing rotation and wrinkling. To evaluate the proposed prototype, we collect two challenging databases including clothes without any pattern, or with multiple colors and different patterns under different conditions of lighting and rotation. Results reported here demonstrate the robustness and effectiveness of the proposed clothing matching system.

12.
IEEE Trans Pattern Anal Mach Intell ; 43(11): 4037-4058, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-32386141

RESUMO

Large-scale labeled data are generally required to train deep neural networks in order to obtain better performance in visual feature learning from images or videos for computer vision applications. To avoid extensive cost of collecting and annotating large-scale datasets, as a subset of unsupervised learning methods, self-supervised learning methods are proposed to learn general image and video features from large-scale unlabeled data without using any human-annotated labels. This paper provides an extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos. First, the motivation, general pipeline, and terminologies of this field are described. Then the common deep neural network architectures that used for self-supervised learning are summarized. Next, the schema and evaluation metrics of self-supervised learning methods are reviewed followed by the commonly used datasets for images, videos, audios, and 3D data, as well as the existing self-supervised visual feature learning methods. Finally, quantitative performance comparisons of the reviewed methods on benchmark datasets are summarized and discussed for both image and video feature learning. At last, this paper is concluded and lists a set of promising future directions for self-supervised visual feature learning.

13.
Comput Med Imaging Graph ; 87: 101817, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33278767

RESUMO

Lung segmentation in Computerized Tomography (CT) images plays an important role in various lung disease diagnosis. Most of the current lung segmentation approaches are performed through a series of procedures with manually empirical parameter adjustments in each step. Pursuing an automatic segmentation method with fewer steps, we propose a novel deep learning Generative Adversarial Network (GAN)-based lung segmentation schema, which we denote as LGAN. The proposed schema can be generalized to different kinds of neural networks for lung segmentation in CT images. We evaluated the proposed LGAN schema on datasets including Lung Image Database Consortium image collection (LIDC-IDRI) and Quantitative Imaging Network (QIN) collection with two metrics: segmentation quality and shape similarity. Also, we compared our work with current state-of-the-art methods. The experimental results demonstrated that the proposed LGAN schema can be used as a promising tool for automatic lung segmentation due to its simplified procedure as well as its improved performance and efficiency.


Assuntos
Processamento de Imagem Assistida por Computador , Tomografia Computadorizada por Raios X , Bases de Dados Factuais , Pulmão/diagnóstico por imagem , Redes Neurais de Computação
14.
IEEE Trans Image Process ; 29: 225-236, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31329556

RESUMO

Deep neural network-based semantic segmentation generally requires large-scale cost extensive annotations for training to obtain better performance. To avoid pixel-wise segmentation annotations that are needed for most methods, recently some researchers attempted to use object-level labels (e.g., bounding boxes) or image-level labels (e.g., image categories). In this paper, we propose a novel recursive coarse-to-fine semantic segmentation framework based on only image-level category labels. For each image, an initial coarse mask is first generated by a convolutional neural network-based unsupervised foreground segmentation model and then is enhanced by a graph model. The enhanced coarse mask is fed to a fully convolutional neural network to be recursively refined. Unlike the existing image-level label-based semantic segmentation methods, which require labeling of all categories for images that contain multiple types of objects, our framework only needs one label for each image and can handle images that contain multi-category objects. Only trained on ImageNet, our framework achieves comparable performance on the PASCAL VOC dataset with other image-level label-based state-of-the-art methods of semantic segmentation. Furthermore, our framework can be easily extended to foreground object segmentation task and achieves comparable performance with the state-of-the-art supervised methods on the Internet object dataset.

15.
IEEE Trans Image Process ; 28(11): 5241-5252, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31135361

RESUMO

In this paper, a self-guiding multimodal LSTM (sgLSTM) image captioning model is proposed to handle an uncontrolled imbalanced real-world image-sentence dataset. We collect a FlickrNYC dataset from Flickr as our testbed with 306,165 images and the original text descriptions uploaded by the users are utilized as the ground truth for training. Descriptions in the FlickrNYC dataset vary dramatically ranging from short term-descriptions to long paragraph-descriptions and can describe any visual aspects, or even refer to objects that are not depicted. To deal with the imbalanced and noisy situation and to fully explore the dataset itself, we propose a novel guiding textual feature extracted utilizing a multimodal LSTM (mLSTM) model. Training of mLSTM is based on the portion of data in which the image content and the corresponding descriptions are strongly bonded. Afterward, during the training of sgLSTM on the rest training data, this guiding information serves as additional input to the network along with the image representations and the ground-truth descriptions. By integrating these input components into a multimodal block, we aim to form a training scheme with the textual information tightly coupled with the image content. The experimental results demonstrate that the proposed sgLSTM model outperforms the traditional state-of-the-art multimodal RNN captioning framework in successfully describing the key components of the input images.

16.
Artigo em Inglês | MEDLINE | ID: mdl-31369378

RESUMO

Text instance provides valuable information for the understanding and interpretation of natural scenes. The rich, precise high-level semantics embodied in the text could be beneficial for understanding the world around us, and empower a wide range of real-world applications. While most recent visual phrase grounding approaches focus on general objects, this paper explores extracting designated texts and predicting unambiguous scene text segmentation mask, i.e. scene text segmentation from natural language descriptions (referring expressions) like orange text on a little boy in black swinging a bat. The solution of this novel problem enables accurate segmentation of scene text instances from the complex background. In our proposed framework, a unified deep network jointly models visual and linguistic information by encoding both region-level and pixel-level visual features of natural scene images into spatial feature maps, and then decode them into saliency response map of text instances. To conduct quantitative evaluations, we establish a new scene text referring expression segmentation dataset: COCO-CharRef. Experimental results demonstrate the effectiveness of the proposed framework on the text instance segmentation task. By combining image-based visual features with language-based textual explanations, our framework outperforms baselines that are derived from state-of-the-art text localization and natural language object retrieval methods on COCO-CharRef dataset.

17.
IEEE Trans Mob Comput ; 18(3): 702-714, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30774566

RESUMO

This paper presents a new holistic vision-based mobile assistive navigation system to help blind and visually impaired people with indoor independent travel. The system detects dynamic obstacles and adjusts path planning in real-time to improve navigation safety. First, we develop an indoor map editor to parse geometric information from architectural models and generate a semantic map consisting of a global 2D traversable grid map layer and context-aware layers. By leveraging the visual positioning service (VPS) within the Google Tango device, we design a map alignment algorithm to bridge the visual area description file (ADF) and semantic map to achieve semantic localization. Using the on-board RGB-D camera, we develop an efficient obstacle detection and avoidance approach based on a time-stamped map Kalman filter (TSM-KF) algorithm. A multi-modal human-machine interface (HMI) is designed with speech-audio interaction and robust haptic interaction through an electronic SmartCane. Finally, field experiments by blindfolded and blind subjects demonstrate that the proposed system provides an effective tool to help blind individuals with indoor navigation and wayfinding.

18.
IEEE Trans Pattern Anal Mach Intell ; 39(5): 1028-1039, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28113701

RESUMO

The advent of cost-effectiveness and easy-operation depth cameras has facilitated a variety of visual recognition tasks including human activity recognition. This paper presents a novel framework for recognizing human activities from video sequences captured by depth cameras. We extend the surface normal to polynormal by assembling local neighboring hypersurface normals from a depth sequence to jointly characterize local motion and shape information. We then propose a general scheme of super normal vector (SNV) to aggregate the low-level polynormals into a discriminative representation, which can be viewed as a simplified version of the Fisher kernel representation. In order to globally capture the spatial layout and temporal order, an adaptive spatio-temporal pyramid is introduced to subdivide a depth video into a set of space-time cells. In the extensive experiments, the proposed approach achieves superior performance to the state-of-the-art methods on the four public benchmark datasets, i.e., MSRAction3D, MSRDailyActivity3D, MSRGesture3D, and MSRActionPairs3D.

19.
IEEE Trans Neural Netw Learn Syst ; 26(9): 2200-5, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25420271

RESUMO

A key challenge in large-scale image classification is how to achieve efficiency in terms of both computation and memory without compromising classification accuracy. The learning-based classifiers achieve the state-of-the-art accuracies, but have been criticized for the computational complexity that grows linearly with the number of classes. The nonparametric nearest neighbor (NN)-based classifiers naturally handle large numbers of categories, but incur prohibitively expensive computation and memory costs. In this brief, we present a novel classification scheme, i.e., discriminative hierarchical K-means tree (D-HKTree), which combines the advantages of both learning-based and NN-based classifiers. The complexity of the D-HKTree only grows sublinearly with the number of categories, which is much better than the recent hierarchical support vector machines-based methods. The memory requirement is the order of magnitude less than the recent Naïve Bayesian NN-based approaches. The proposed D-HKTree classification scheme is evaluated on several challenging benchmark databases and achieves the state-of-the-art accuracies, while with significantly lower computation cost and memory requirement.

20.
Biol Psychol ; 65(1): 49-66, 2003 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-14638288

RESUMO

The assumption that the smile is an evolved facial display suggests that there may be universal features of smiling in addition to the basic facial configuration. We show that smiles include not only a stable configuration of features, but also temporally consistent movement patterns. In spontaneous smiles from two social contexts, duration of lip corner movement during the onset phase was independent of social context and the presence of other facial movements, including dampening. These additional movements produced variation in both peak and offset duration. Both onsets and offsets had dynamic properties similar to automatically controlled movements, with a consistent relation between maximum velocity and amplitude of lip corner movement in smiles from two distinct contexts. Despite the effects of individual and social factors on facial expression timing overall, consistency in onset and offset phases suggests that portions of the smile display are relatively stereotyped and may be automatically produced.


Assuntos
Comunicação não Verbal , Sorriso , Comportamento Social , Adulto , Evolução Biológica , Emoções , Feminino , Humanos , Lábio , Masculino , Movimento , Percepção
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA