Pesquisa | BVS IEC

1.

Utilizing online stochastic optimization on scheduling of intensity-modulate radiotherapy therapy (IMRT).

Chang, W H; Lo, Sonia M; Chen, Tzu-Li; Chen, James C; Wu, Hao-Ning.

J Biomed Inform ; 108: 103499, 2020 08.

Artigo em Inglês | MEDLINE | ID: mdl-32653620

RESUMO

According to Ministry of Health and Welfare of Taiwan, cancer has been one of the major causes of death in Taiwan since 1982. The Intensive-Modulated Radiation Therapy (IMRT) is one of the most important radiotherapies of cancers, especially for Nasopharyngeal cancers, Digestive system cancers and Cervical cancers. For patients, if they can receive the treatment at the earliest possibility while diagnosed with cancers, their survival rate increases. However, the discussion of effective patient scheduling models of IMRT to reduce patients' waiting time is still limited in literature. This study proposed a mathematical model to improve the efficiency of patient scheduling. The research was composed of two stages. In the first stage, the online stochastic algorithm was proposed to improve the performance of present scheduling system. In the second stage the impact of future treatment to reduce patients' waiting time was considered. A genetic algorithm (GA) was then proposed to solve the online stochastic scheduling problem. This research collected data from a practical medical institute and the proposed model was validated with real data. It contributes to both theory and practice by proposing a practical model to assist the medical institute in implementing patient scheduling in a more efficient manner.

Assuntos

Radioterapia de Intensidade Modulada , Algoritmos , Agendamento de Consultas , Humanos , Modelos Teóricos , Planejamento da Radioterapia Assistida por Computador , Taiwan

2.

Q-BENCH: A Benchmark for Multi-modal Foundation Models on Low-level Vision from Single Images to Pairs.

Zhang, Zicheng; Wu, Haoning; Zhang, Erli; Zhai, Guangtao; Lin, Weisi.

IEEE Trans Pattern Anal Mach Intell ; PP2024 Aug 21.

Artigo em Inglês | MEDLINE | ID: mdl-39167507

RESUMO

The rapid development of Multi-modality Large Language Models (MLLMs) has navigated a paradigm shift in computer vision, moving towards versatile foundational models. However, evaluating MLLMs in low-level visual perception and understanding remains a yet-to-explore domain. To this end, we design benchmark settings to emulate human language responses related to low-level vision: the low-level visual perception (A1) via visual question answering related to low-level attributes (e.g. clarity, lighting); and the low-level visual description (A2), on evaluating MLLMs for low-level text descriptions. Furthermore, given that pairwise comparison can better avoid ambiguity of responses and has been adopted by many human experiments, we further extend the low-level perception-related questionanswering and description evaluations of MLLMs from single images to image pairs. Specifically, for perception (A1), we carry out the LLVisionQA+ dataset, comprising 2,990 single images and 1,999 image pairs each accompanied by an open-ended question about its low-level features; for description (A2), we propose the LLDescribe+ dataset, evaluating MLLMs for low-level descriptions on 499 single images and 450 pairs. Additionally, we evaluate MLLMs on assessment (A3) ability, i.e. predicting score, by employing a softmax-based approach to enable all MLLMs to generate quantifiable quality ratings, tested against human opinions in 7 image quality assessment (IQA) datasets. With 24 MLLMs under evaluation, we demonstrate that several MLLMs have decent low-level visual competencies on single images, but only GPT-4V exhibits higher accuracy on pairwise comparisons than single image evaluations (like humans). We hope that our benchmark will motivate further research into uncovering and enhancing these nascent capabilities of MLLMs. Datasets will be available at https://github.com/Q-Future/Q-Bench.

3.

TOPIQ: A Top-Down Approach From Semantics to Distortions for Image Quality Assessment.

Chen, Chaofeng; Mo, Jiadi; Hou, Jingwen; Wu, Haoning; Liao, Liang; Sun, Wenxiu; Yan, Qiong; Lin, Weisi.

IEEE Trans Image Process ; 33: 2404-2418, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38517711

RESUMO

Image Quality Assessment (IQA) is a fundamental task in computer vision that has witnessed remarkable progress with deep neural networks. Inspired by the characteristics of the human visual system, existing methods typically use a combination of global and local representations (i.e., multi-scale features) to achieve superior performance. However, most of them adopt simple linear fusion of multi-scale features, and neglect their possibly complex relationship and interaction. In contrast, humans typically first form a global impression to locate important regions and then focus on local details in those regions. We therefore propose a top-down approach that uses high-level semantics to guide the IQA network to focus on semantically important local distortion regions, named as TOPIQ. Our approach to IQA involves the design of a heuristic coarse-to-fine network (CFANet) that leverages multi-scale features and progressively propagates multi-level semantic information to low-level representations in a top-down manner. A key component of our approach is the proposed cross-scale attention mechanism, which calculates attention maps for lower level features guided by higher level features. This mechanism emphasizes active semantic regions for low-level distortions, thereby improving performance. TOPIQ can be used for both Full-Reference (FR) and No-Reference (NR) IQA. We use ResNet50 as its backbone and demonstrate that TOPIQ achieves better or competitive performance on most public FR and NR benchmarks compared with state-of-the-art methods based on vision transformers, while being much more efficient (with only â¼ 13% FLOPS of the current best FR method). Codes are released at https://github.com/chaofengc/IQA-PyTorch.

4.

Blind Video Quality Prediction by Uncovering Human Video Perceptual Representation.

Liao, Liang; Xu, Kangmin; Wu, Haoning; Chen, Chaofeng; Sun, Wenxiu; Yan, Qiong; Jay Kuo, C-C; Lin, Weisi.

IEEE Trans Image Process ; 33: 4998-5013, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39236121

RESUMO

Blind video quality assessment (VQA) has become an increasingly demanding problem in automatically assessing the quality of ever-growing in-the-wild videos. Although efforts have been made to measure temporal distortions, the core to distinguish between VQA and image quality assessment (IQA), the lack of modeling of how the human visual system (HVS) relates to the temporal quality of videos hinders the precise mapping of predicted temporal scores to the human perception. Inspired by the recent discovery of the temporal straightness law of natural videos in the HVS, this paper intends to model the complex temporal distortions of in-the-wild videos in a simple and uniform representation by describing the geometric properties of videos in the visual perceptual domain. A novel videolet, with perceptual representation embedding of a few consecutive frames, is designed as the basic quality measurement unit to quantify temporal distortions by measuring the angular and linear displacements from the straightness law. By combining the predicted score on each videolet, a perceptually temporal quality evaluator (PTQE) is formed to measure the temporal quality of the entire video. Experimental results demonstrate that the perceptual representation in the HVS is an efficient way of predicting subjective temporal quality. Moreover, when combined with spatial quality metrics, PTQE achieves top performance over popular in-the-wild video datasets. More importantly, PTQE requires no additional information beyond the video being assessed, making it applicable to any dataset without parameter tuning. Additionally, the generalizability of PTQE is evaluated on video frame interpolation tasks, demonstrating its potential to benefit temporal-related enhancement tasks.

Assuntos

Processamento de Imagem Assistida por Computador , Gravação em Vídeo , Humanos , Gravação em Vídeo/métodos , Processamento de Imagem Assistida por Computador/métodos , Percepção Visual/fisiologia , Algoritmos

5.

Towards Transparent Deep Image Aesthetics Assessment with Tag-based Content Descriptors.

Hou, Jingwen; Lin, Weisi; Fang, Yuming; Wu, Haoning; Chen, Chaofeng; Liao, Liang; Liu, Weide.

IEEE Trans Image Process ; PP2023 Aug 30.

Artigo em Inglês | MEDLINE | ID: mdl-37647188

RESUMO

Deep learning approaches for Image Aesthetics Assessment (IAA) have shown promising results in recent years, but the internal mechanisms of these models remain unclear. Previous studies have demonstrated that image aesthetics can be predicted using semantic features, such as pre-trained object classification features. However, these semantic features are learned implicitly, and therefore, previous works have not elucidated what the semantic features are representing. In this work, we aim to create a more transparent deep learning framework for IAA by introducing explainable semantic features. To achieve this, we propose Tag-based Content Descriptors (TCDs), where each value in a TCD describes the relevance of an image to a human-readable tag that refers to a specific type of image content. This allows us to build IAA models from explicit descriptions of image contents. We first propose the explicit matching process to produce TCDs that adopt predefined tags to describe image contents. We show that a simple MLP-based IAA model with TCDs only based on predefined tags can achieve an SRCC of 0.767, which is comparable to most state-of-the-art methods. However, predefined tags may not be sufficient to describe all possible image contents that the model may encounter. Therefore, we further propose the implicit matching process to describe image contents that cannot be described by predefined tags. By integrating components obtained from the implicit matching process into TCDs, the IAA model further achieves an SRCC of 0.817, which significantly outperforms existing IAA methods. Both the explicit matching process and the implicit matching process are realized by the proposed TCD generator. To evaluate the performance of the proposed TCD generator in matching images with predefined tags, we also labeled 5101 images with photography-related tags to form a validation set. And experimental results show that the proposed TCD generator can meaningfully assign photography-related tags to images.

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA