Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 73
Filtrar
2.
IEEE Trans Image Process ; 33: 5392-5407, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39312416

RESUMO

The analysis and prediction of visual attention have long been crucial tasks in the fields of computer vision and image processing. In practical applications, images are generally accompanied by various text descriptions, however, few studies have explored the influence of text descriptions on visual attention, let alone developed visual saliency prediction models considering text guidance. In this paper, we conduct a comprehensive study on text-guided image saliency (TIS) from both subjective and objective perspectives. Specifically, we construct a TIS database named SJTU-TIS, which includes 1200 text-image pairs and the corresponding collected eye-tracking data. Based on the established SJTU-TIS database, we analyze the influence of various text descriptions on visual attention. Then, to facilitate the development of saliency prediction models considering text influence, we construct a benchmark for the established SJTU-TIS database using state-of-the-art saliency models. Finally, considering the effect of text descriptions on visual attention, while most existing saliency models ignore this impact, we further propose a text-guided saliency (TGSal) prediction model, which extracts and integrates both image features and text features to predict the image saliency under various text-description conditions. Our proposed model significantly outperforms the state-of-the-art saliency models on both the SJTU-TIS database and the pure image saliency databases in terms of various evaluation metrics. The SJTU-TIS database and the code of the proposed TGSal model will be released at: https://github.com/IntMeGroup/TGSal.

3.
Artigo em Inglês | MEDLINE | ID: mdl-39167507

RESUMO

The rapid development of Multi-modality Large Language Models (MLLMs) has navigated a paradigm shift in computer vision, moving towards versatile foundational models. However, evaluating MLLMs in low-level visual perception and understanding remains a yet-to-explore domain. To this end, we design benchmark settings to emulate human language responses related to low-level vision: the low-level visual perception (A1) via visual question answering related to low-level attributes (e.g. clarity, lighting); and the low-level visual description (A2), on evaluating MLLMs for low-level text descriptions. Furthermore, given that pairwise comparison can better avoid ambiguity of responses and has been adopted by many human experiments, we further extend the low-level perception-related questionanswering and description evaluations of MLLMs from single images to image pairs. Specifically, for perception (A1), we carry out the LLVisionQA+ dataset, comprising 2,990 single images and 1,999 image pairs each accompanied by an open-ended question about its low-level features; for description (A2), we propose the LLDescribe+ dataset, evaluating MLLMs for low-level descriptions on 499 single images and 450 pairs. Additionally, we evaluate MLLMs on assessment (A3) ability, i.e. predicting score, by employing a softmax-based approach to enable all MLLMs to generate quantifiable quality ratings, tested against human opinions in 7 image quality assessment (IQA) datasets. With 24 MLLMs under evaluation, we demonstrate that several MLLMs have decent low-level visual competencies on single images, but only GPT-4V exhibits higher accuracy on pairwise comparisons than single image evaluations (like humans). We hope that our benchmark will motivate further research into uncovering and enhancing these nascent capabilities of MLLMs. Datasets will be available at https://github.com/Q-Future/Q-Bench.

4.
JAMA Netw Open ; 7(8): e2425124, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39106068

RESUMO

IMPORTANCE: Identifying pediatric eye diseases at an early stage is a worldwide issue. Traditional screening procedures depend on hospitals and ophthalmologists, which are expensive and time-consuming. Using artificial intelligence (AI) to assess children's eye conditions from mobile photographs could facilitate convenient and early identification of eye disorders in a home setting. OBJECTIVE: To develop an AI model to identify myopia, strabismus, and ptosis using mobile photographs. DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional study was conducted at the Department of Ophthalmology of Shanghai Ninth People's Hospital from October 1, 2022, to September 30, 2023, and included children who were diagnosed with myopia, strabismus, or ptosis. MAIN OUTCOMES AND MEASURES: A deep learning-based model was developed to identify myopia, strabismus, and ptosis. The performance of the model was assessed using sensitivity, specificity, accuracy, the area under the curve (AUC), positive predictive values (PPV), negative predictive values (NPV), positive likelihood ratios (P-LR), negative likelihood ratios (N-LR), and the F1-score. GradCAM++ was utilized to visually and analytically assess the impact of each region on the model. A sex subgroup analysis and an age subgroup analysis were performed to validate the model's generalizability. RESULTS: A total of 1419 images obtained from 476 patients (225 female [47.27%]; 299 [62.82%] aged between 6 and 12 years) were used to build the model. Among them, 946 monocular images were used to identify myopia and ptosis, and 473 binocular images were used to identify strabismus. The model demonstrated good sensitivity in detecting myopia (0.84 [95% CI, 0.82-0.87]), strabismus (0.73 [95% CI, 0.70-0.77]), and ptosis (0.85 [95% CI, 0.82-0.87]). The model showed comparable performance in identifying eye disorders in both female and male children during sex subgroup analysis. There were differences in identifying eye disorders among different age subgroups. CONCLUSIONS AND RELEVANCE: In this cross-sectional study, the AI model demonstrated strong performance in accurately identifying myopia, strabismus, and ptosis using only smartphone images. These results suggest that such a model could facilitate the early detection of pediatric eye diseases in a convenient manner at home.


Assuntos
Inteligência Artificial , Diagnóstico Precoce , Fotografação , Humanos , Feminino , Masculino , Estudos Transversais , Criança , Pré-Escolar , Fotografação/métodos , Miopia/diagnóstico , Aprendizado Profundo , Estrabismo/diagnóstico , Blefaroptose/diagnóstico , Sensibilidade e Especificidade , China/epidemiologia , Oftalmopatias/diagnóstico , Adolescente
5.
Int Dent J ; 2024 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-39098480

RESUMO

INTRODUCTION AND AIMS: In the face of escalating oral cancer rates, the application of large language models like Generative Pretrained Transformer (GPT)-4 presents a novel pathway for enhancing public awareness about prevention and early detection. This research aims to explore the capabilities and possibilities of GPT-4 in addressing open-ended inquiries in the field of oral cancer. METHODS: Using 60 questions accompanied by reference answers, covering concepts, causes, treatments, nutrition, and other aspects of oral cancer, evaluators from diverse backgrounds were selected to evaluate the capabilities of GPT-4 and a customized version. A P value under .05 was considered significant. RESULTS: Analysis revealed that GPT-4 and its adaptations notably excelled in answering open-ended questions, with the majority of responses receiving high scores. Although the median score for standard GPT-4 was marginally better, statistical tests showed no significant difference in capabilities between the two models (P > .05). Despite statistical significance indicated diverse backgrounds of evaluators have statistically difference (P < .05), a post hoc test and comprehensive analysis demonstrated that both editions of GPT-4 demonstrated equivalent capabilities in answering questions concerning oral cancer. CONCLUSIONS: GPT-4 has demonstrated its capability to furnish responses to open-ended inquiries concerning oral cancer. Utilizing this advanced technology to boost public awareness about oral cancer is viable and has much potential. When it's unable to locate pertinent information, it will resort to their inherent knowledge base or recommend consulting professionals after offering some basic information. Therefore, it cannot supplant the expertise and clinical judgment of surgical oncologists and could be used as an adjunctive evaluation tool.

6.
Comput Biol Med ; 180: 109025, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39159544

RESUMO

INTRODUCTION: In the treatment of malocclusion, continuous monitoring of the three-dimensional relationship between dental roots and the surrounding alveolar bone is essential for preventing complications from orthodontic procedures. Cone-beam computed tomography (CBCT) provides detailed root and bone data, but its high radiation dose limits its frequent use, consequently necessitating an alternative for ongoing monitoring. OBJECTIVES: We aimed to develop a deep learning-based cross-temporal multimodal image fusion system for acquiring root and jawbone information without additional radiation, enhancing the ability of orthodontists to monitor risk. METHODS: Utilizing CBCT and intraoral scans (IOSs) as cross-temporal modalities, we integrated deep learning with multimodal fusion technologies to develop a system that includes a CBCT segmentation model for teeth and jawbones. This model incorporates a dynamic kernel prior model, resolution restoration, and an IOS segmentation network optimized for dense point clouds. Additionally, a coarse-to-fine registration module was developed. This system facilitates the integration of IOS and CBCT images across varying spatial and temporal dimensions, enabling the comprehensive reconstruction of root and jawbone information throughout the orthodontic treatment process. RESULTS: The experimental results demonstrate that our system not only maintains the original high resolution but also delivers outstanding segmentation performance on external testing datasets for CBCT and IOSs. CBCT achieved Dice coefficients of 94.1 % and 94.4 % for teeth and jawbones, respectively, and it achieved a Dice coefficient of 91.7 % for the IOSs. Additionally, in the context of real-world registration processes, the system achieved an average distance error (ADE) of 0.43 mm for teeth and 0.52 mm for jawbones, significantly reducing the processing time. CONCLUSION: We developed the first deep learning-based cross-temporal multimodal fusion system, addressing the critical challenge of continuous risk monitoring in orthodontic treatments without additional radiation exposure. We hope that this study will catalyze transformative advancements in risk management strategies and treatment modalities, fundamentally reshaping the landscape of future orthodontic practice.


Assuntos
Tomografia Computadorizada de Feixe Cônico , Aprendizado Profundo , Humanos , Tomografia Computadorizada de Feixe Cônico/métodos , Ortodontia/métodos , Má Oclusão/diagnóstico por imagem , Má Oclusão/terapia
7.
Endocrine ; 2024 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-39046593

RESUMO

PURPOSE: Thyroid eye disease (TED) is the most common orbital disease in adults. Ocular motility restriction is the primary complaint of patients, while its evaluation is quite difficult. The present study aimed to introduce an artificial intelligence (AI) model based on orbital computed tomography (CT) images for ocular motility score. METHODS: A total of 410 sets of CT images and clinical data were obtained from the hospital. To build a triple classification predictive model for ocular motility score, multiple deep learning models were employed to extract features of images and clinical data. Subgroup analyses based on pertinent clinical features were performed to test the efficacy of models. RESULTS: The ResNet-34 network outperformed Alex-Net and VGG16-Net in prediction of ocular motility score, with the optimal accuracy (ACC) of 0.907, 0.870, and 0.890, respectively. Subgroup analyses indicated no significant difference in ACC between active or inactive phase, functional visual field diplopia or peripheral visual field diplopia (p > 0.05). However, in the gender subgroup, the prediction model performed more accurately in female patients than males (p = 0.02). CONCLUSION: In conclusion, the AI model based on CT images and clinical data successfully realized automatic scoring of ocular motility in TED patients. This approach potentially enhanced the efficiency and accuracy of ocular motility evaluation, thus facilitating clinical application.

8.
IEEE Trans Pattern Anal Mach Intell ; 46(11): 7056-7071, 2024 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-38625773

RESUMO

Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by comparing our model generalization capabilities on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.

9.
Comput Biol Med ; 174: 108431, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38626507

RESUMO

Skin wrinkles result from intrinsic aging processes and extrinsic influences, including prolonged exposure to ultraviolet radiation and tobacco smoking. Hence, the identification of wrinkles holds significant importance in skin aging and medical aesthetic investigation. Nevertheless, current methods lack the comprehensiveness to identify facial wrinkles, particularly those that may appear insignificant. Furthermore, the current assessment techniques neglect to consider the blurred boundary of wrinkles and cannot differentiate images with varying resolutions. This research introduces a novel wrinkle detection algorithm and a distance-based loss function to identify full-face wrinkles. Furthermore, we develop a wrinkle detection evaluation metric that assesses outcomes based on curve, location, and gradient similarity. We collected and annotated a dataset for wrinkle detection consisting of 1021 images of Chinese faces. The dataset will be made publicly available to further promote wrinkle detection research. The research demonstrates a substantial enhancement in detecting subtle wrinkles through implementing the proposed method. Furthermore, the suggested evaluation procedure effectively considers the indistinct boundaries of wrinkles and is applicable to images with various resolutions.


Assuntos
Algoritmos , Bases de Dados Factuais , Face , Envelhecimento da Pele , Humanos , Envelhecimento da Pele/fisiologia , Face/diagnóstico por imagem , Feminino , Masculino , Processamento de Imagem Assistida por Computador/métodos , Adulto
10.
Comput Biol Med ; 174: 108399, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38615461

RESUMO

Glaucoma is one of the leading cause of blindness worldwide. Individuals affected by glaucoma, including patients and their family members, frequently encounter a deficit in dependable support beyond the confines of clinical environments. Seeking advice via the internet can be a difficult task due to the vast amount of disorganized and unstructured material available on these sites, nevertheless. This research explores how Large Language Models (LLMs) can be leveraged to better serve medical research and benefit glaucoma patients. We introduce Xiaoqing, a Natural Language Processing (NLP) model specifically tailored for the glaucoma field, detailing its development and deployment. To evaluate its effectiveness, we conducted two forms of experiments: comparative and experiential. In the comparative analysis, we presented 22 glaucoma-related questions in simplified Chinese to three medical NLP models (Xiaoqing LLMs, HuaTuo, Ivy GPT) and two general models (ChatGPT-3.5 and ChatGPT-4), covering a range of topics from basic glaucoma knowledge to treatment, surgery, research, management standards, and patient lifestyle. Responses were assessed for informativeness and readability. The experiential experiment involved glaucoma patients and non-patients interacting with Xiaoqing, collecting and analyzing their questions and feedback on the same criteria. The findings demonstrated that Xiaoqing notably outperformed the other models in terms of informativeness and readability, suggesting that Xiaoqing is a significant advancement in the management and treatment of glaucoma in China. We also provide a Web-based version of Xiaoqing, allowing readers to directly experience its functionality. The Web-based Xiaoqing is available at https://qa.glaucoma-assistant.com//qa.


Assuntos
Glaucoma , Humanos , Glaucoma/tratamento farmacológico , Glaucoma/fisiopatologia , Processamento de Linguagem Natural , Masculino , Feminino
11.
IEEE Trans Image Process ; 33: 1898-1910, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38451761

RESUMO

In this paper, we present a simple yet effective continual learning method for blind image quality assessment (BIQA) with improved quality prediction accuracy, plasticity-stability trade-off, and task-order/-length robustness. The key step in our approach is to freeze all convolution filters of a pre-trained deep neural network (DNN) for an explicit promise of stability, and learn task-specific normalization parameters for plasticity. We assign each new IQA dataset (i.e., task) a prediction head, and load the corresponding normalization parameters to produce a quality score. The final quality estimate is computed by a weighted summation of predictions from all heads with a lightweight K -means gating mechanism. Extensive experiments on six IQA datasets demonstrate the advantages of the proposed method in comparison to previous training techniques for BIQA.

12.
Sensors (Basel) ; 24(6)2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38544251

RESUMO

Restricted mouth opening (trismus) is one of the most common complications following head and neck cancer treatment. Early initiation of mouth-opening exercises is crucial for preventing or minimizing trismus. Current methods for these exercises predominantly involve finger exercises and traditional mouth-opening training devices. Our research group successfully designed an intelligent mouth-opening training device (IMOTD) that addresses the limitations of traditional home training methods, including the inability to quantify mouth-opening exercises, a lack of guided training resulting in temporomandibular joint injuries, and poor training continuity leading to poor training effect. For this device, an interactive remote guidance mode is introduced to address these concerns. The device was designed with a focus on the safety and effectiveness of medical devices. The accuracy of the training data was verified through piezoelectric sensor calibration. Through mechanical analysis, the stress points of the structure were identified, and finite element analysis of the connecting rod and the occlusal plate connection structure was conducted to ensure the safety of the device. The findings support the effectiveness of the intelligent device in rehabilitation through preclinical experiments when compared with conventional mouth-opening training methods. This intelligent device facilitates the quantification and visualization of mouth-opening training indicators, ensuring both the comfort and safety of the training process. Additionally, it enables remote supervision and guidance for patient training, thereby enhancing patient compliance and ultimately ensuring the effectiveness of mouth-opening exercises.


Assuntos
Neoplasias de Cabeça e Pescoço , Trismo , Humanos , Trismo/etiologia , Trismo/reabilitação , Terapia por Exercício/métodos , Exercício Físico , Boca
13.
Comput Biol Med ; 171: 108212, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38422967

RESUMO

BACKGROUND: Deep learning-based super-resolution (SR) algorithms aim to reconstruct low-resolution (LR) images into high-fidelity high-resolution (HR) images by learning the low- and high-frequency information. Experts' diagnostic requirements are fulfilled in medical application scenarios through the high-quality reconstruction of LR digital medical images. PURPOSE: Medical image SR algorithms should satisfy the requirements of arbitrary resolution and high efficiency in applications. However, there is currently no relevant study available. Several SR research on natural images have accomplished the reconstruction of resolutions without limitations. However, these methodologies provide challenges in meeting medical applications due to the large scale of the model, which significantly limits efficiency. Hence, we suggest a highly effective method for reconstructing medical images at any desired resolution. METHODS: Statistical features of medical images exhibit greater continuity in the region of neighboring pixels than natural images. Hence, the process of reconstructing medical images is comparatively less challenging. Utilizing this property, we develop a neighborhood evaluator to represent the continuity of the neighborhood while controlling the network's depth. RESULTS: The suggested method has superior performance across seven scales of reconstruction, as evidenced by experiments conducted on panoramic radiographs and two external public datasets. Furthermore, the proposed network significantly decreases the parameter count by over 20× and the computational workload by over 10× compared to prior researches. On large-scale reconstruction, the inference speed can be enhanced by over 5×. CONCLUSION: The novel proposed SR strategy for medical images performs efficient reconstruction at arbitrary resolution, marking a significant breakthrough in the field. The given scheme facilitates the implementation of SR in mobile medical platforms.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Processamento de Imagem Assistida por Computador/métodos
14.
Comput Biol Med ; 170: 108067, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38301513

RESUMO

BACKGROUND: Ocular Adnexal Lymphoma (OAL) is a non-Hodgkin's lymphoma that most often appears in the tissues near the eye, and radiotherapy is the currently preferred treatment. There has been a controversy regarding the prognostic factors for systemic failure of OAL radiotherapy, the thorough evaluation prior to receiving radiotherapy is highly recommended to better the patient's prognosis and minimize the likelihood of any adverse effects. PURPOSE: To investigate the risk factors that contribute to incomplete remission in OAL radiotherapy and to establish a hybrid model for predicting the radiotherapy outcomes in OAL patients. METHODS: A retrospective chart review was performed for 87 consecutive patients with OAL who received radiotherapy between Feb 2011 and August 2022 in our center. Seven image features, derived from MRI sequences, were integrated with 122 clinical features to form comprehensive patient feature sets. Chemometric algorithms were then employed to distill highly informative features from these sets. Based on these refined features, SVM and XGBoost classifiers were performed to classify the effect of radiotherapy. RESULTS: The clinical records of from 87 OAL patients (median age: 60 months, IQR: 52-68 months; 62.1% male) treated with radiotherapy were reviewed. Analysis of Lasso (AUC = 0.75, 95% CI: 0.72-0.77) and Random Forest (AUC = 0.67, 95% CI: 0.62-0.70) algorithms revealed four potential features, resulting in an intersection AUC of 0.80 (95% CI: 0.75-0.82). Logistic Regression (AUC = 0.75, 95% CI: 0.72-0.77) identified two features. Furthermore, the integration of chemometric methods such as CARS (AUC = 0.66, 95% CI: 0.62-0.72), UVE (AUC = 0.71, 95% CI: 0.66-0.75), and GA (AUC = 0.65, 95% CI: 0.60-0.69) highlighted six features in total, with an intersection AUC of 0.82 (95% CI: 0.78-0.83). These features included enophthalmos, diplopia, tenderness, elevated ALT count, HBsAg positivity, and CD43 positivity in immunohistochemical tests. CONCLUSION: The findings suggest the effectiveness of chemometric algorithms in pinpointing OAL risk factors, and the prediction model we proposed shows promise in helping clinicians identify OAL patients likely to achieve complete remission via radiotherapy. Notably, patients with a history of exophthalmos, diplopia, tenderness, elevated ALT levels, HBsAg positivity, and CD43 positivity are less likely to attain complete remission after radiotherapy. These insights offer more targeted management strategies for OAL patients. The developed model is accessible online at: https://lzz.testop.top/.


Assuntos
Neoplasias Oculares , Linfoma não Hodgkin , Humanos , Masculino , Pré-Escolar , Feminino , Estudos Retrospectivos , Quimiometria , Diplopia , Antígenos de Superfície da Hepatite B , Neoplasias Oculares/diagnóstico por imagem , Neoplasias Oculares/radioterapia , Linfoma não Hodgkin/diagnóstico por imagem , Linfoma não Hodgkin/radioterapia , Linfoma não Hodgkin/patologia , Algoritmos
15.
IEEE Trans Pattern Anal Mach Intell ; 46(8): 5852-5872, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38376963

RESUMO

Video compression is indispensable to most video analysis systems. Despite saving the transportation bandwidth, it also deteriorates downstream video understanding tasks, especially at low-bitrate settings. To systematically investigate this problem, we first thoroughly review the previous methods, revealing that three principles, i.e., task-decoupled, label-free, and data-emerged semantic prior, are critical to a machine-friendly coding framework but are not fully satisfied so far. In this paper, we propose a traditional-neural mixed coding framework that simultaneously fulfills all these principles, by taking advantage of both traditional codecs and neural networks (NNs). On one hand, the traditional codecs can efficiently encode the pixel signal of videos but may distort the semantic information. On the other hand, highly non-linear NNs are proficient in condensing video semantics into a compact representation. The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved w.r.t. the coding procedure, which is spontaneously learned from unlabeled data in a self-supervised manner. The videos collaboratively decoded from two streams (codec and NN) are of rich semantics, as well as visually photo-realistic, empirically boosting several mainstream downstream video analysis task performances without any post-adaptation procedure. Furthermore, by introducing the attention mechanism and adaptive modeling scheme, the video semantic modeling ability of our approach is further enhanced. Fianlly, we build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach. All codes, data, and models will be open-sourced for facilitating future research.

16.
Phenomics ; 3(5): 469-484, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37881321

RESUMO

Thyroid cancer, a common endocrine malignancy, is one of the leading death causes among endocrine tumors. The diagnosis of pathological section analysis suffers from diagnostic delay and cumbersome operating procedures. Therefore, we intend to construct the models based on spectral data that can be potentially used for rapid intraoperative papillary thyroid carcinoma (PTC) diagnosis and characterize PTC characteristics. To alleviate any concerns pathologists may have about using the model, we conducted an analysis of the used bands that can be interpreted pathologically. A spectra acquisition system was first built to acquire spectra of pathological section images from 91 patients. The obtained spectral dataset contains 217 spectra of normal thyroid tissue and 217 spectra of PTC tissue. Clinical data of the corresponding patients were collected for subsequent model interpretability analysis. The experiment has been approved by the Ethics Review Committee of the Wuhu Hospital of East China Normal University. The spectral preprocessing method was used to process the spectra, and the preprocessed signal respectively optimized by the first and secondary informative wavelengths selection was used to develop the PTC detection models. The PTC detection model using mean centering (MC) and multiple scattering correction (MSC) has optimal performance, and the reasons for the good performance were analyzed in combination with the spectral acquisition process and composition of the test slide. For model interpretable analysis, the near-ultraviolet band selected for modeling corresponds to the location of amino acid absorption peak, and this is consistent with the clinical phenomenon of significantly lower amino acid concentrations in PTC patients. Moreover, the absorption peak of hemoglobin selected for modeling is consistent with the low hemoglobin index in PTC patients. In addition, the correlation analysis was performed between the selected wavelengths and the clinical data, and the results show: the reflection intensity of selected wavelengths in normal cells has a moderate correlation with cell arrangement structure, nucleus size and free thyroxine (FT4), and has a strong correlation with triiodothyronine (T3); the reflection intensity of selected bands in PTC cells has a moderate correlation with free triiodothyronine (FT3).

17.
Comput Biol Med ; 165: 107344, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37603961

RESUMO

Medical record images in EHR system are users' privacy and an asset, and there is an urgent need to protect this data. Image steganography can offer a potential solution. A steganographic model for medical record images is therefore developed based on StegaStamp. In contrast to natural images, medical record images are document images, which can be very vulnerable to image cropping attacks. Therefore, we use text region segmentation and watermark region localization to combat the image cropping attack. The distortion network has been designed to take into account the distortion that can occur during the transmission of medical record images, making the model robust against communication induced distortions. In addition, based on StegaStamp, we innovatively introduced FISM as part of the loss function to reduce the ripple texture in the steganographic image. The experimental results show that the designed distortion network and the FISM loss function term can be well suited for the steganographic task of medical record images from the perspective of decoding accuracy and image quality.


Assuntos
Confidencialidade , Prontuários Médicos , Informática Médica
18.
Sensors (Basel) ; 23(10)2023 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-37430638

RESUMO

New CMOS imaging sensor (CIS) techniques in smartphones have helped user-generated content dominate our lives over traditional DSLRs. However, tiny sensor sizes and fixed focal lengths also lead to more grainy details, especially for zoom photos. Moreover, multi-frame stacking and post-sharpening algorithms would produce zigzag textures and over-sharpened appearances, for which traditional image-quality metrics may over-estimate. To solve this problem, a real-world zoom photo database is first constructed in this paper, which includes 900 tele-photos from 20 different mobile sensors and ISPs. Then we propose a novel no-reference zoom quality metric which incorporates the traditional estimation of sharpness and the concept of image naturalness. More specifically, for the measurement of image sharpness, we are the first to combine the total energy of the predicted gradient image with the entropy of the residual term under the framework of free-energy theory. To further compensate for the influence of over-sharpening effect and other artifacts, a set of model parameters of mean subtracted contrast normalized (MSCN) coefficients are utilized as the natural statistics representatives. Finally, these two measures are combined linearly. Experimental results on the zoom photo database demonstrate that our quality metric can achieve SROCC and PLCC over 0.91, while the performance of single sharpness or naturalness index is around 0.85. Moreover, compared with the best tested general-purpose and sharpness models, our zoom metric outperforms them by 0.072 and 0.064 in SROCC, respectively.

19.
IEEE Trans Image Process ; 32: 3847-3861, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37428674

RESUMO

In recent years, User Generated Content (UGC) has grown dramatically in video sharing applications. It is necessary for service-providers to use video quality assessment (VQA) to monitor and control users' Quality of Experience when watching UGC videos. However, most existing UGC VQA studies only focus on the visual distortions of videos, ignoring that the perceptual quality also depends on the accompanying audio signals. In this paper, we conduct a comprehensive study on UGC audio-visual quality assessment (AVQA) from both subjective and objective perspectives. Specially, we construct the first UGC AVQA database named SJTU-UAV database, which includes 520 in-the-wild UGC audio and video (A/V) sequences collected from the YFCC100m database. A subjective AVQA experiment is conducted on the database to obtain the mean opinion scores (MOSs) of the A/V sequences. To demonstrate the content diversity of the SJTU-UAV database, we give a detailed analysis of the SJTU-UAV database as well as other two synthetically-distorted AVQA databases and one authentically-distorted VQA database, from both the audio and video aspects. Then, to facilitate the development of AVQA fields, we construct a benchmark of AVQA models on the proposed SJTU-UAV database and other two AVQA databases, of which the benchmark models consist of AVQA models designed for synthetically distorted A/V sequences and AVQA models built through combining the popular VQA methods and audio features via support vector regressor (SVR). Finally, considering benchmark AVQA models perform poorly in assessing in-the-wild UGC videos, we further propose an effective AVQA model via jointly learning quality-aware audio and visual feature representations in the temporal domain, which is seldom investigated by existing AVQA models. Our proposed model outperforms the aforementioned benchmark AVQA models on the SJTU-UAV database and two synthetically distorted AVQA databases. The SJTU-UAV database and the code of the proposed model will be released to facilitate further research.


Assuntos
Aprendizagem , Bases de Dados Factuais , Gravação em Vídeo/métodos , Humanos
20.
Front Neurosci ; 17: 1187619, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37456990

RESUMO

Aim: The aim of this study is to evaluate the utility of binocular chromatic pupillometry in detecting impaired pupillary light response (PLR) in patients with primary open-angle glaucoma (POAG) and to assess the feasibility of using binocular chromatic pupillometer in opportunistic POAG diagnosis in community-based or telemedicine-based services. Methods: In this prospective, cross-sectional study, 74 patients with POAG and 23 healthy controls were enrolled. All participants underwent comprehensive ophthalmologic examinations including optical coherence tomography (OCT) and standard automated perimetry (SAP). The PLR tests included sequential tests of full-field chromatic stimuli weighted by rods, intrinsically photosensitive retinal ganglion cells (ipRGCs), and cones (Experiment 1), as well as alternating chromatic light flash-induced relative afferent pupillary defect (RAPD) test (Experiment 2). In Experiment 1, the constricting amplitude, velocity, and time to maximum constriction/dilation were calculated in three cell type-weighted responses, and the post-illumination response of ipRGC-weighted response was evaluated. In Experiment 2, infrared pupillary asymmetry (IPA) amplitude and anisocoria duration induced by intermittent blue or red light flashes were calculated. Results: In Experiment 1, the PLR of POAG patients was significantly reduced in all conditions, reflecting the defect in photoreception through rods, cones, and ipRGCs. The variable with the highest area under the receiver operating characteristic curve (AUC) was time to max dilation under ipRGC-weighted stimulus, followed by the constriction amplitude under cone-weighted stimulus and the constriction amplitude response to ipRGC-weighted stimuli. The impaired PLR features were associated with greater visual field loss, thinner retinal nerve fiber layer (RNFL) thickness, and cupping of the optic disk. In Experiment 2, IPA and anisocoria duration induced by intermittent blue or red light flashes were significantly greater in participants with POAG than in controls. IPA and anisocoria duration had good diagnostic value, correlating with the inter-eye asymmetry of visual field loss. Conclusion: We demonstrate that binocular chromatic pupillometry could potentially serve as an objective clinical tool for opportunistic glaucoma diagnosis in community-based or telemedicine-based services. Binocular chromatic pupillometry allows an accurate, objective, and rapid assessment of retinal structural impairment and functional loss in glaucomatous eyes of different severity levels.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...