Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Artículo en Inglés | MEDLINE | ID: mdl-38843429

RESUMEN

Objective: This study aims to investigate the prevalence of post-traumatic stress disorder (PTSD) in patients with acute myocardial infarction (AMI) after percutaneous coronary intervention (PCI). Additionally, the study will analyze the correlation between self-efficacy and PTSD in patients with acute myocardial infarction who have undergone PCI. Methods: This study focused on 268 AMI patients admitted to our hospital between April 2019 and March 2022. We utilized the Posttraumatic Stress Disorder Scale-Civilian Version (PCL-C) to conduct a questionnaire survey and analyzed the correlation between self-efficacy, postoperative fatigue, and PTSD using Pearson. Additionally, we established a structural equation model (SEM) using Amos 21.0 software and conducted a mediation effect test. Results: (1) The PTSD score of 268 AMI patients in this study after PCI was (36.62 ± 4.62), the fatigue score was (8.62 ± 0.82), and the fatigue score was (8.62 ± 0.82). 0.82), and the self-efficacy score was (19.34 ± 2.24); (2) Gender, educational level, and complications were the influencing factors of PTSD in AMI patients (P < .05); (3) Pearson analysis showed that PTSD after PCI in AMI patients was correlated positively with fatigue and had a negative correlation with self-efficacy; fatigue It was negatively correlated with self-efficacy (both P < .01); (4) The mediating effect of self-efficacy between fatigue and PTSD in AMI patients after PCI was established, and the mediating effect value was 29.31%. Conclusion: PTSD, fatigue, and self-efficacy after PCI in AMI patients are all at moderate levels, which need clinical attention-29.31% mediating effect between fatigue and PTSD, confirming that fatigue can affect PTSD by regulating self-efficacy.

2.
Artículo en Inglés | MEDLINE | ID: mdl-38354073

RESUMEN

Point clouds have garnered increasing research attention and found numerous practical applications. However, many of these applications, such as autonomous driving and robotic manipulation, rely on sequential point clouds, essentially adding a temporal dimension to the data (i.e., four dimensions) because the information of the static point cloud data could provide is still limited. Recent research efforts have been directed towards enhancing the understanding and utilization of sequential point clouds. This paper offers a comprehensive review of deep learning methods applied to sequential point cloud research, encompassing dynamic flow estimation, object detection & tracking, point cloud segmentation, and point cloud forecasting. This paper further summarizes and compares the quantitative results of the reviewed methods over the public benchmark datasets. Ultimately, the paper concludes by addressing the challenges in current sequential point cloud research and pointing towards promising avenues for future research.

3.
Bioengineering (Basel) ; 10(1)2023 Jan 13.
Artículo en Inglés | MEDLINE | ID: mdl-36671688

RESUMEN

Early intervention in kidney cancer helps to improve survival rates. Abdominal computed tomography (CT) is often used to diagnose renal masses. In clinical practice, the manual segmentation and quantification of organs and tumors are expensive and time-consuming. Artificial intelligence (AI) has shown a significant advantage in assisting cancer diagnosis. To reduce the workload of manual segmentation and avoid unnecessary biopsies or surgeries, in this paper, we propose a novel end-to-end AI-driven automatic kidney and renal mass diagnosis framework to identify the abnormal areas of the kidney and diagnose the histological subtypes of renal cell carcinoma (RCC). The proposed framework first segments the kidney and renal mass regions by a 3D deep learning architecture (Res-UNet), followed by a dual-path classification network utilizing local and global features for the subtype prediction of the most common RCCs: clear cell, chromophobe, oncocytoma, papillary, and other RCC subtypes. To improve the robustness of the proposed framework on the dataset collected from various institutions, a weakly supervised learning schema is proposed to leverage the domain gap between various vendors via very few CT slice annotations. Our proposed diagnosis system can accurately segment the kidney and renal mass regions and predict tumor subtypes, outperforming existing methods on the KiTs19 dataset. Furthermore, cross-dataset validation results demonstrate the robustness of datasets collected from different institutions trained via the weakly supervised learning schema.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 4302-4320, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-35877805

RESUMEN

Understanding human behavior and activity facilitates advancement of numerous real-world applications, and is critical for video analysis. Despite the progress of action recognition algorithms in trimmed videos, the majority of real-world videos are lengthy and untrimmed with sparse segments of interest. The task of temporal activity detection in untrimmed videos aims to localize the temporal boundary of actions and classify the action categories. Temporal activity detection task has been investigated in full and limited supervision settings depending on the availability of action annotations. This article provides an extensive overview of deep learning-based algorithms to tackle temporal action detection in untrimmed videos with different supervision levels including fully-supervised, weakly-supervised, unsupervised, self-supervised, and semi-supervised. In addition, this article reviews advances in spatio-temporal action detection where actions are localized in both temporal and spatial dimensions. Action detection in online setting is also reviewed where the goal is to detect actions in each frame without considering any future context in a live video stream. Moreover, the commonly used action detection benchmark datasets and evaluation metrics are described, and the performance of the state-of-the-art methods are compared. Finally, real-world applications of temporal action detection in untrimmed videos and a set of future directions are discussed.

5.
Comput Med Imaging Graph ; 100: 102094, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-35914340

RESUMEN

Contrast agents are commonly used to highlight blood vessels, organs, and other structures in magnetic resonance imaging (MRI) and computed tomography (CT) scans. However, these agents may cause allergic reactions or nephrotoxicity, limiting their use in patients with kidney dysfunctions. In this paper, we propose a generative adversarial network (GAN) based framework to automatically synthesize contrast-enhanced CTs directly from the non-contrast CTs in the abdomen and pelvis region. The respiratory and peristaltic motion can affect the pixel-level mapping of contrast-enhanced learning, which makes this task more challenging than other body parts. A perceptual loss is introduced to compare high-level semantic differences of the enhancement areas between the virtual contrast-enhanced and actual contrast-enhanced CT images. Furthermore, to accurately synthesize the intensity details as well as remain texture structures of CT images, a dual-path training schema is proposed to learn the texture and structure features simultaneously. Experiment results on three contrast phases (i.e. arterial, portal, and delayed phase) show the potential to synthesize virtual contrast-enhanced CTs directly from non-contrast CTs of the abdomen and pelvis for clinical evaluation.


Asunto(s)
Abdomen , Tomografía Computarizada por Rayos X , Abdomen/diagnóstico por imagen , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Imagen por Resonancia Magnética , Pelvis/diagnóstico por imagen , Pelvis/patología , Tomografía Computarizada por Rayos X/métodos
6.
Front Radiol ; 2: 1041518, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-37492669

RESUMEN

Medical imaging data annotation is expensive and time-consuming. Supervised deep learning approaches may encounter overfitting if trained with limited medical data, and further affect the robustness of computer-aided diagnosis (CAD) on CT scans collected by various scanner vendors. Additionally, the high false-positive rate in automatic lung nodule detection methods prevents their applications in daily clinical routine diagnosis. To tackle these issues, we first introduce a novel self-learning schema to train a pre-trained model by learning rich feature representatives from large-scale unlabeled data without extra annotation, which guarantees a consistent detection performance over novel datasets. Then, a 3D feature pyramid network (3DFPN) is proposed for high-sensitivity nodule detection by extracting multi-scale features, where the weights of the backbone network are initialized by the pre-trained model and then fine-tuned in a supervised manner. Further, a High Sensitivity and Specificity (HS2) network is proposed to reduce false positives by tracking the appearance changes among continuous CT slices on Location History Images (LHI) for the detected nodule candidates. The proposed method's performance and robustness are evaluated on several publicly available datasets, including LUNA16, SPIE-AAPM, LungTIME, and HMS. Our proposed detector achieves the state-of-the-art result of 90.6% sensitivity at 1/8 false positive per scan on the LUNA16 dataset. The proposed framework's generalizability has been evaluated on three additional datasets (i.e., SPIE-AAPM, LungTIME, and HMS) captured by different types of CT scanners.

7.
IEEE Trans Pattern Anal Mach Intell ; 44(3): 1638-1652, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-32822292

RESUMEN

Text instance as one category of self-described objects provides valuable information for understanding and describing cluttered scenes. The rich and precise high-level semantics embodied in the text could drastically benefit the understanding of the world around us. While most recent visual phrase grounding approaches focus on general objects, this paper explores extracting designated texts and predicting unambiguous scene text information, i.e., to accurately localize and recognize a specific targeted text instance in a cluttered image from natural language descriptions (referring expressions). To address this issue, first a novel recurrent dense text localization network (DTLN) is proposed to sequentially decode the intermediate convolutional representations of a cluttered scene image into a set of distinct text instance detections. Our approach avoids repeated text detections at multiple scales by recurrently memorizing previous detections, and effectively tackles crowded text instances in close proximity. Second, we propose a context reasoning text retrieval (CRTR) model, which jointly encodes text instances and their context information through a recurrent network, and ranks localized text bounding boxes by a scoring function of context compatibility. Third, a recurrent text recognition module is introduced to extend the applicability of aforementioned DTLN and CRTR models, via text verification or transcription. Quantitative evaluations on standard scene text extraction benchmarks and a newly collected scene text retrieval dataset demonstrate the effectiveness and advantages of our models for the joint scene text localization, retrieval, and recognition task.


Asunto(s)
Algoritmos , Semántica
8.
IEEE Trans Fuzzy Syst ; 29(1): 34-45, 2021 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-33408453

RESUMEN

Traditional deep learning methods are sub-optimal in classifying ambiguity features, which often arise in noisy and hard to predict categories, especially, to distinguish semantic scoring. Semantic scoring, depending on semantic logic to implement evaluation, inevitably contains fuzzy description and misses some concepts, for example, the ambiguous relationship between normal and probably normal always presents unclear boundaries (normal - more likely normal - probably normal). Thus, human error is common when annotating images. Differing from existing methods that focus on modifying kernel structure of neural networks, this study proposes a dominant fuzzy fully connected layer (FFCL) for Breast Imaging Reporting and Data System (BI-RADS) scoring and validates the universality of this proposed structure. This proposed model aims to develop complementary properties of scoring for semantic paradigms, while constructing fuzzy rules based on analyzing human thought patterns, and to particularly reduce the influence of semantic conglutination. Specifically, this semantic-sensitive defuzzier layer projects features occupied by relative categories into semantic space, and a fuzzy decoder modifies probabilities of the last output layer referring to the global trend. Moreover, the ambiguous semantic space between two relative categories shrinks during the learning phases, as the positive and negative growth trends of one category appearing among its relatives were considered. We first used the Euclidean Distance (ED) to zoom in the distance between the real scores and the predicted scores, and then employed two sample t test method to evidence the advantage of the FFCL architecture. Extensive experimental results performed on the CBIS-DDSM dataset show that our FFCL structure can achieve superior performances for both triple and multiclass classification in BI-RADS scoring, outperforming the state-of-the-art methods.

9.
IEEE Trans Pattern Anal Mach Intell ; 43(11): 4037-4058, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-32386141

RESUMEN

Large-scale labeled data are generally required to train deep neural networks in order to obtain better performance in visual feature learning from images or videos for computer vision applications. To avoid extensive cost of collecting and annotating large-scale datasets, as a subset of unsupervised learning methods, self-supervised learning methods are proposed to learn general image and video features from large-scale unlabeled data without using any human-annotated labels. This paper provides an extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos. First, the motivation, general pipeline, and terminologies of this field are described. Then the common deep neural network architectures that used for self-supervised learning are summarized. Next, the schema and evaluation metrics of self-supervised learning methods are reviewed followed by the commonly used datasets for images, videos, audios, and 3D data, as well as the existing self-supervised visual feature learning methods. Finally, quantitative performance comparisons of the reviewed methods on benchmark datasets are summarized and discussed for both image and video feature learning. At last, this paper is concluded and lists a set of promising future directions for self-supervised visual feature learning.

10.
Comput Med Imaging Graph ; 87: 101817, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33278767

RESUMEN

Lung segmentation in Computerized Tomography (CT) images plays an important role in various lung disease diagnosis. Most of the current lung segmentation approaches are performed through a series of procedures with manually empirical parameter adjustments in each step. Pursuing an automatic segmentation method with fewer steps, we propose a novel deep learning Generative Adversarial Network (GAN)-based lung segmentation schema, which we denote as LGAN. The proposed schema can be generalized to different kinds of neural networks for lung segmentation in CT images. We evaluated the proposed LGAN schema on datasets including Lung Image Database Consortium image collection (LIDC-IDRI) and Quantitative Imaging Network (QIN) collection with two metrics: segmentation quality and shape similarity. Also, we compared our work with current state-of-the-art methods. The experimental results demonstrated that the proposed LGAN schema can be used as a promising tool for automatic lung segmentation due to its simplified procedure as well as its improved performance and efficiency.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Tomografía Computarizada por Rayos X , Bases de Datos Factuales , Pulmón/diagnóstico por imagen , Redes Neurales de la Computación
11.
IEEE Trans Image Process ; 29: 225-236, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-31329556

RESUMEN

Deep neural network-based semantic segmentation generally requires large-scale cost extensive annotations for training to obtain better performance. To avoid pixel-wise segmentation annotations that are needed for most methods, recently some researchers attempted to use object-level labels (e.g., bounding boxes) or image-level labels (e.g., image categories). In this paper, we propose a novel recursive coarse-to-fine semantic segmentation framework based on only image-level category labels. For each image, an initial coarse mask is first generated by a convolutional neural network-based unsupervised foreground segmentation model and then is enhanced by a graph model. The enhanced coarse mask is fed to a fully convolutional neural network to be recursively refined. Unlike the existing image-level label-based semantic segmentation methods, which require labeling of all categories for images that contain multiple types of objects, our framework only needs one label for each image and can handle images that contain multi-category objects. Only trained on ImageNet, our framework achieves comparable performance on the PASCAL VOC dataset with other image-level label-based state-of-the-art methods of semantic segmentation. Furthermore, our framework can be easily extended to foreground object segmentation task and achieves comparable performance with the state-of-the-art supervised methods on the Internet object dataset.

12.
Artículo en Inglés | MEDLINE | ID: mdl-31369378

RESUMEN

Text instance provides valuable information for the understanding and interpretation of natural scenes. The rich, precise high-level semantics embodied in the text could be beneficial for understanding the world around us, and empower a wide range of real-world applications. While most recent visual phrase grounding approaches focus on general objects, this paper explores extracting designated texts and predicting unambiguous scene text segmentation mask, i.e. scene text segmentation from natural language descriptions (referring expressions) like orange text on a little boy in black swinging a bat. The solution of this novel problem enables accurate segmentation of scene text instances from the complex background. In our proposed framework, a unified deep network jointly models visual and linguistic information by encoding both region-level and pixel-level visual features of natural scene images into spatial feature maps, and then decode them into saliency response map of text instances. To conduct quantitative evaluations, we establish a new scene text referring expression segmentation dataset: COCO-CharRef. Experimental results demonstrate the effectiveness of the proposed framework on the text instance segmentation task. By combining image-based visual features with language-based textual explanations, our framework outperforms baselines that are derived from state-of-the-art text localization and natural language object retrieval methods on COCO-CharRef dataset.

13.
IEEE Trans Image Process ; 28(11): 5241-5252, 2019 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-31135361

RESUMEN

In this paper, a self-guiding multimodal LSTM (sgLSTM) image captioning model is proposed to handle an uncontrolled imbalanced real-world image-sentence dataset. We collect a FlickrNYC dataset from Flickr as our testbed with 306,165 images and the original text descriptions uploaded by the users are utilized as the ground truth for training. Descriptions in the FlickrNYC dataset vary dramatically ranging from short term-descriptions to long paragraph-descriptions and can describe any visual aspects, or even refer to objects that are not depicted. To deal with the imbalanced and noisy situation and to fully explore the dataset itself, we propose a novel guiding textual feature extracted utilizing a multimodal LSTM (mLSTM) model. Training of mLSTM is based on the portion of data in which the image content and the corresponding descriptions are strongly bonded. Afterward, during the training of sgLSTM on the rest training data, this guiding information serves as additional input to the network along with the image representations and the ground-truth descriptions. By integrating these input components into a multimodal block, we aim to form a training scheme with the textual information tightly coupled with the image content. The experimental results demonstrate that the proposed sgLSTM model outperforms the traditional state-of-the-art multimodal RNN captioning framework in successfully describing the key components of the input images.

14.
IEEE Trans Mob Comput ; 18(3): 702-714, 2019 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-30774566

RESUMEN

This paper presents a new holistic vision-based mobile assistive navigation system to help blind and visually impaired people with indoor independent travel. The system detects dynamic obstacles and adjusts path planning in real-time to improve navigation safety. First, we develop an indoor map editor to parse geometric information from architectural models and generate a semantic map consisting of a global 2D traversable grid map layer and context-aware layers. By leveraging the visual positioning service (VPS) within the Google Tango device, we design a map alignment algorithm to bridge the visual area description file (ADF) and semantic map to achieve semantic localization. Using the on-board RGB-D camera, we develop an efficient obstacle detection and avoidance approach based on a time-stamped map Kalman filter (TSM-KF) algorithm. A multi-modal human-machine interface (HMI) is designed with speech-audio interaction and robust haptic interaction through an electronic SmartCane. Finally, field experiments by blindfolded and blind subjects demonstrate that the proposed system provides an effective tool to help blind individuals with indoor navigation and wayfinding.

15.
IEEE Trans Pattern Anal Mach Intell ; 39(5): 1028-1039, 2017 05.
Artículo en Inglés | MEDLINE | ID: mdl-28113701

RESUMEN

The advent of cost-effectiveness and easy-operation depth cameras has facilitated a variety of visual recognition tasks including human activity recognition. This paper presents a novel framework for recognizing human activities from video sequences captured by depth cameras. We extend the surface normal to polynormal by assembling local neighboring hypersurface normals from a depth sequence to jointly characterize local motion and shape information. We then propose a general scheme of super normal vector (SNV) to aggregate the low-level polynormals into a discriminative representation, which can be viewed as a simplified version of the Fisher kernel representation. In order to globally capture the spatial layout and temporal order, an adaptive spatio-temporal pyramid is introduced to subdivide a depth video into a set of space-time cells. In the extensive experiments, the proposed approach achieves superior performance to the state-of-the-art methods on the four public benchmark datasets, i.e., MSRAction3D, MSRDailyActivity3D, MSRGesture3D, and MSRActionPairs3D.

16.
IEEE Trans Neural Netw Learn Syst ; 26(9): 2200-5, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25420271

RESUMEN

A key challenge in large-scale image classification is how to achieve efficiency in terms of both computation and memory without compromising classification accuracy. The learning-based classifiers achieve the state-of-the-art accuracies, but have been criticized for the computational complexity that grows linearly with the number of classes. The nonparametric nearest neighbor (NN)-based classifiers naturally handle large numbers of categories, but incur prohibitively expensive computation and memory costs. In this brief, we present a novel classification scheme, i.e., discriminative hierarchical K-means tree (D-HKTree), which combines the advantages of both learning-based and NN-based classifiers. The complexity of the D-HKTree only grows sublinearly with the number of categories, which is much better than the recent hierarchical support vector machines-based methods. The memory requirement is the order of magnitude less than the recent Naïve Bayesian NN-based approaches. The proposed D-HKTree classification scheme is evaluated on several challenging benchmark databases and achieves the state-of-the-art accuracies, while with significantly lower computation cost and memory requirement.

17.
IEEE Trans Image Process ; 23(7): 2972-82, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24759989

RESUMEN

Text characters and strings in natural scene can provide valuable information for many applications. Extracting text directly from natural scene images or videos is a challenging task because of diverse text patterns and variant background interferences. This paper proposes a method of scene text recognition from detected text regions. In text detection, our previously proposed algorithms are applied to obtain text regions from scene image. First, we design a discriminative character descriptor by combining several state-of-the-art feature detectors and descriptors. Second, we model character structure at each character class by designing stroke configuration maps. Our algorithm design is compatible with the application of scene text extraction in smart mobile devices. An Android-based demo system is developed to show the effectiveness of our proposed method on scene text information extraction from nearby objects. The demo system also provides us some insight into algorithm design and performance improvement of scene text extraction. The evaluation results on benchmark data sets demonstrate that our proposed scheme of text recognition is comparable with the best existing methods.


Asunto(s)
Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos , Aplicaciones Móviles , Reconocimiento de Normas Patrones Automatizadas/métodos , Minería de Datos , Escritura
18.
Netw Model Anal Health Inform Bioinform ; 2(2): 61-70, 2013 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-23914344

RESUMEN

In this paper, we present a new framework to monitor medication intake for elderly individuals by incorporating a video camera and Radio Frequency Identification (RFID) sensors. The proposed framework can provide a key function for monitoring activities of daily living (ADLs) of elderly people at their own home. In an assistive environment, RFID tags are applied on medicine bottles located in a medicine cabinet so that each medicine bottle will have a unique ID. The description of the medicine data for each tag is manually input to a database. RFID readers will detect if any of these bottles are taken away from the medicine cabinet and identify the tag attached on the medicine bottle. A video camera is installed to continue monitoring the activity of taking medicine by integrating face detection and tracking, mouth detection, background subtraction, and activity detection. The preliminary results demonstrate that 100% detection accuracy for identifying medicine bottles and promising results for monitoring activity of taking medicine.

19.
Netw Model Anal Health Inform Bioinform ; 2(2): 81-93, 2013 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-23914345

RESUMEN

Signage plays a very important role to find destinations in applications of navigation and wayfinding. In this paper, we propose a novel framework to detect doors and signage to help blind people accessing unfamiliar indoor environments. In order to eliminate the interference information and improve the accuracy of signage detection, we first extract the attended areas by using a saliency map. Then the signage is detected in the attended areas by using a bipartite graph matching. The proposed method can handle multiple signage detection. Furthermore, in order to provide more information for blind users to access the area associated with the detected signage, we develop a robust method to detect doors based on a geometric door frame model which is independent to door appearances. Experimental results on our collected datasets of indoor signage and doors demonstrate the effectiveness and efficiency of our proposed method.

20.
Netw Model Anal Health Inform Bioinform ; 2(2): 71-79, 2013 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-23894729

RESUMEN

Computer vision technology has been widely used for blind assistance, such as navigation and wayfinding. However, few camera-based systems are developed for helping blind or visually-impaired people to find daily necessities. In this paper, we propose a prototype system of blind-assistant object finding by camera-based network and matching-based recognition. We collect a dataset of daily necessities and apply Speeded-Up Robust Features (SURF) and Scale Invariant Feature Transform (SIFT) feature descriptors to perform object recognition. Experimental results demonstrate the effectiveness of our prototype system.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...