RESUMO
For the past century, the nucleus has been the focus of extensive investigations in cell biology. However, many questions remain about how its shape and size are regulated during development, in different tissues, or during disease and aging. To track these changes, microscopy has long been the tool of choice. Image analysis has revolutionized this field of research by providing computational tools that can be used to translate qualitative images into quantitative parameters. Many tools have been designed to delimit objects in 2D and, eventually, in 3D in order to define their shapes, their number or their position in nuclear space. Today, the field is driven by deep-learning methods, most of which take advantage of convolutional neural networks. These techniques are remarkably adapted to biomedical images when trained using large datasets and powerful computer graphics cards. To promote these innovative and promising methods to cell biologists, this Review summarizes the main concepts and terminologies of deep learning. Special emphasis is placed on the availability of these methods. We highlight why the quality and characteristics of training image datasets are important and where to find them, as well as how to create, store and share image datasets. Finally, we describe deep-learning methods well-suited for 3D analysis of nuclei and classify them according to their level of usability for biologists. Out of more than 150 published methods, we identify fewer than 12 that biologists can use, and we explain why this is the case. Based on this experience, we propose best practices to share deep-learning methods with biologists.
Assuntos
Aprendizado Profundo , Núcleo Celular , Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional , Microscopia/métodos , Redes Neurais de ComputaçãoRESUMO
In this manuscript, attentive dual residual generative adversarial network optimized using wild horse optimization algorithm for brain tumor detection (ADRGAN-WHOA-BTD) is proposed. Here, the input imageries are gathered using BraTS, RemBRANDT, and Figshare datasets. Initially, the images are preprocessed to increase the quality of images and eliminate the unwanted noises. The preprocessing is performed with dual-tree complex wavelet transform (DTCWT). The image features like geodesic data and texture features like contrasts, energy, correlations, homogeneity, and entropy are extracted using multilayer dense net methods. Then, the extracted images are given to attentive dual residual generative adversarial network (ADRGAN) classifier for classifying the brain imageries. The ADRGAN weight parameters are tuned based on wild horse optimization algorithm (WHOA). The proposed method is executed in MATLAB. For the BraTS dataset, the ADRGAN-WHOA-BTD method achieved accuracy, sensitivity, specificity, F-measure, precision, and error rates of 99.85%, 99.82%, 98.92%, 99.76%, 99.45%, and 0.15%, respectively. Then, the proposed technique demonstrated a runtime of 13 s, significantly outperforming existing methods.
RESUMO
BACKGROUND: Data collected from hospitals are usually partially annotated by radiologists due to time constraints. Developing and evaluating deep learning models on these data may result in over or under estimation PURPOSE: We aimed to quantitatively investigate how the percentage of annotated lesions in CT images will influence the performance of universal lesion detection (ULD) algorithms. METHODS: We trained a multi-view feature pyramid network with position-aware attention (MVP-Net) to perform ULD. Three versions of the DeepLesion dataset were created for training MVP-Net. Original DeepLesion Dataset (OriginalDL) is the publicly available, widely studied DeepLesion dataset that includes 32 735 lesions in 4427 patients which were partially labeled during routine clinical practice. Enriched DeepLesion Dataset (EnrichedDL) is an enhanced dataset that features fully labeled at one or more time points for 4145 patients with 34 317 lesions. UnionDL is the union of the OriginalDL and EnrichedDL with 54 510 labeled lesions in 4427 patients. Each dataset was used separately to train MVP-Net, resulting in the following models: OriginalCNN (replicating the original result), EnrichedCNN (testing the effect of increased annotation), and UnionCNN (featuring the greatest number of annotations). RESULTS: Although the reported mean sensitivity of OriginalCNN was 84.3% using the OriginalDL testing set, the performance fell sharply when tested on the EnrichedDL testing set, yielding mean sensitivities of 56.1%, 66.0%, and 67.8% for OriginalCNN, EnrichedCNN, and UnionCNN, respectively. We also found that increasing the percentage of annotated lesions in the training set increased sensitivity, but the margin of increase in performance gradually diminished according to the power law. CONCLUSIONS: We expanded and improved the existing DeepLesion dataset by annotating additional 21 775 lesions, and we demonstrated that using fully labeled CT images avoided overestimation of MVP-Net's performance while increasing the algorithm's sensitivity, which may have a huge impact to the future CT lesion detection research. The annotated lesions are at https://github.com/ComputationalImageAnalysisLab/DeepLesionData.
Assuntos
Algoritmos , Aprendizado Profundo , Interpretação de Imagem Radiográfica Assistida por Computador , Tomografia Computadorizada por Raios X , Humanos , Tomografia Computadorizada por Raios X/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Processamento de Imagem Assistida por Computador/métodos , Neoplasias/diagnóstico por imagem , Bases de Dados Factuais , Redes Neurais de ComputaçãoRESUMO
Deformations introduced during the production of plastic components degrade the accuracy of their 3D geometric information, a critical aspect of object inspection processes. This phenomenon is prevalent among primary plastic products from manufacturers. This work proposes a solution for the deformation estimation of textureless plastic objects using only a single RGB image. This solution encompasses a unique image dataset of five deformed parts, a novel method for generating mesh labels, sequential deformation, and a training model based on graph convolution. The proposed sequential deformation method outperforms the prevalent chamfer distance algorithm in generating precise mesh labels. The training model projects object vertices into features extracted from the input image, and then, predicts vertex location offsets based on the projected features. The predicted meshes using these offsets achieve a sub-millimeter accuracy on synthetic images and approximately 2.0 mm on real images.
RESUMO
With an increase in both global warming and the human population, forest fires have become a major global concern. This can lead to climatic shifts and the greenhouse effect, among other adverse outcomes. Surprisingly, human activities have caused a disproportionate number of forest fires. Fast detection with high accuracy is the key to controlling this unexpected event. To address this, we proposed an improved forest fire detection method to classify fires based on a new version of the Detectron2 platform (a ground-up rewrite of the Detectron library) using deep learning approaches. Furthermore, a custom dataset was created and labeled for the training model, and it achieved higher precision than the other models. This robust result was achieved by improving the Detectron2 model in various experimental scenarios with a custom dataset and 5200 images. The proposed model can detect small fires over long distances during the day and night. The advantage of using the Detectron2 algorithm is its long-distance detection of the object of interest. The experimental results proved that the proposed forest fire detection method successfully detected fires with an improved precision of 99.3%.
RESUMO
Authorities and policymakers in Korea have recently prioritized improving fire prevention and emergency response. Governments seek to enhance community safety for residents by constructing automated fire detection and identification systems. This study examined the efficacy of YOLOv6, a system for object identification running on an NVIDIA GPU platform, to identify fire-related items. Using metrics such as object identification speed, accuracy research, and time-sensitive real-world applications, we analyzed the influence of YOLOv6 on fire detection and identification efforts in Korea. We conducted trials using a fire dataset comprising 4000 photos collected through Google, YouTube, and other resources to evaluate the viability of YOLOv6 in fire recognition and detection tasks. According to the findings, YOLOv6's object identification performance was 0.98, with a typical recall of 0.96 and a precision of 0.83. The system achieved an MAE of 0.302%. These findings suggest that YOLOv6 is an effective technique for detecting and identifying fire-related items in photos in Korea. Multi-class object recognition using random forests, k-nearest neighbors, support vector, logistic regression, naive Bayes, and XGBoost was performed on the SFSC data to evaluate the system's capacity to identify fire-related objects. The results demonstrate that for fire-related objects, XGBoost achieved the highest object identification accuracy, with values of 0.717 and 0.767. This was followed by random forest, with values of 0.468 and 0.510. Finally, we tested YOLOv6 in a simulated fire evacuation scenario to gauge its practicality in emergencies. The results show that YOLOv6 can accurately identify fire-related items in real time within a response time of 0.66 s. Therefore, YOLOv6 is a viable option for fire detection and recognition in Korea. The XGBoost classifier provides the highest accuracy when attempting to identify objects, achieving remarkable results. Furthermore, the system accurately identifies fire-related objects while they are being detected in real-time. This makes YOLOv6 an effective tool to use in fire detection and identification initiatives.
RESUMO
Smart farming (SF) applications rely on robust and accurate computer vision systems. An important computer vision task in agriculture is semantic segmentation, which aims to classify each pixel of an image and can be used for selective weed removal. State-of-the-art implementations use convolutional neural networks (CNN) that are trained on large image datasets. In agriculture, publicly available RGB image datasets are scarce and often lack detailed ground-truth information. In contrast to agriculture, other research areas feature RGB-D datasets that combine color (RGB) with additional distance (D) information. Such results show that including distance as an additional modality can improve model performance further. Therefore, we introduce WE3DS as the first RGB-D image dataset for multi-class plant species semantic segmentation in crop farming. It contains 2568 RGB-D images (color image and distance map) and corresponding hand-annotated ground-truth masks. Images were taken under natural light conditions using an RGB-D sensor consisting of two RGB cameras in a stereo setup. Further, we provide a benchmark for RGB-D semantic segmentation on the WE3DS dataset and compare it with a solely RGB-based model. Our trained models achieve up to 70.7% mean Intersection over Union (mIoU) for discriminating between soil, seven crop species, and ten weed species. Finally, our work confirms the finding that additional distance information improves segmentation quality.
RESUMO
Traditional identification methods for Papaver somniferum and Papaver rhoeas (PSPR) consume much time and labor, require strict experimental conditions, and usually cause damage to the plant. This work presents a novel method for fast, accurate, and nondestructive identification of PSPR. First, to fill the gap in the PSPR dataset, we construct a PSPR visible capsule image dataset. Second, we propose a modified MobileNetV3-Small network with transfer learning, and we solve the problem of low classification accuracy and slow model convergence due to the small number of PSPR capsule image samples. Experimental results demonstrate that the modified MobileNetV3-Small is effective for fast, accurate, and nondestructive PSPR classification.
RESUMO
In-flight system failure is one of the major safety concerns in the operation of unmanned aerial vehicles (UAVs) in urban environments. To address this concern, a safety framework consisting of following three main tasks can be utilized: (1) Monitoring health of the UAV and detecting failures, (2) Finding potential safe landing spots in case a critical failure is detected in step 1, and (3) Steering the UAV to a safe landing spot found in step 2. In this paper, we specifically look at the second task, where we investigate the feasibility of utilizing object detection methods to spot safe landing spots in case the UAV suffers an in-flight failure. Particularly, we investigate different versions of the YOLO objection detection method and compare their performances for the specific application of detecting a safe landing location for a UAV that has suffered an in-flight failure. We compare the performance of YOLOv3, YOLOv4, and YOLOv5l while training them by a large aerial image dataset called DOTA in a Personal Computer (PC) and also a Companion Computer (CC). We plan to use the chosen algorithm on a CC that can be attached to a UAV, and the PC is used to verify the trends that we see between the algorithms on the CC. We confirm the feasibility of utilizing these algorithms for effective emergency landing spot detection and report their accuracy and speed for that specific application. Our investigation also shows that the YOLOv5l algorithm outperforms YOLOv4 and YOLOv3 in terms of accuracy of detection while maintaining a slightly slower inference speed.
Assuntos
Algoritmos , Dispositivos Aéreos não TripuladosRESUMO
Magnetic resonance (MR) imaging is an important computer-aided diagnosis technique with rich pathological information. The factor of physical and physiological constraint seriously affects the applicability of that technique. Thus, computed tomography (CT)-based radiotherapy is more popular on account of its imaging rapidity and environmental simplicity. Therefore, it is of great theoretical and practical significance to design a method that can construct an MR image from the corresponding CT image. In this paper, we treat MR imaging as a machine vision problem and propose a multi-conditional constraint generative adversarial network (GAN) for MR imaging from CT scan data. Considering reversibility of GAN, both generator and reverse generator are designed for MR and CT imaging, respectively, which can constrain each other and improve consistency between features of CT and MR images. In addition, we innovatively treat the real and generated MR image discrimination as object re-identification; cosine error fusing with original GAN loss is designed to enhance verisimilitude and textural features of the MR image. The experimental results with the challenging public CT-MR image dataset show distinct performance improvement over other GANs utilized in medical imaging and demonstrate the effect of our method for medical image modal transformation.
Assuntos
Processamento de Imagem Assistida por Computador , Imageamento por Ressonância Magnética , Diagnóstico por Computador , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Cintilografia , Tomografia Computadorizada por Raios X/métodosRESUMO
BACKGROUND: Artificial intelligence (AI) research is highly dependent on the nature of the data available. With the steady increase of AI applications in the medical field, the demand for quality medical data is increasing significantly. We here describe the development of a platform for providing and sharing digital pathology data to AI researchers, and highlight challenges to overcome in operating a sustainable platform in conjunction with pathologists. METHODS: Over 3000 pathological slides from five organs (liver, colon, prostate, pancreas and biliary tract, and kidney) in histologically confirmed tumor cases by pathology departments at three hospitals were selected for the dataset. After digitalizing the slides, tumor areas were annotated and overlaid onto the images by pathologists as the ground truth for AI training. To reduce the pathologists' workload, AI-assisted annotation was established in collaboration with university AI teams. RESULTS: A web-based data sharing platform was developed to share massive pathological image data in 2019. This platform includes 3100 images, and 5 pre-processing algorithms for AI researchers to easily load images into their learning models. DISCUSSION: Due to different regulations among countries for privacy protection, when releasing internationally shared learning platforms, it is considered to be most prudent to obtain consent from patients during data acquisition. CONCLUSIONS: Despite limitations encountered during platform development and model training, the present medical image sharing platform can steadily fulfill the high demand of AI developers for quality data. This study is expected to help other researchers intending to generate similar platforms that are more effective and accessible in the future.
Assuntos
Inteligência Artificial , Neoplasias , Algoritmos , Humanos , MasculinoRESUMO
Aerial scene recognition is a fundamental visual task and has attracted an increasing research interest in the last few years. Most of current researches mainly deploy efforts to categorize an aerial image into one scene-level label, while in real-world scenarios, there often exist multiple scenes in a single image. Therefore, in this paper, we propose to take a step forward to a more practical and challenging task, namely multi-scene recognition in single images. Moreover, we note that manually yielding annotations for such a task is extraordinarily time- and labor-consuming. To address this, we propose a prototype-based memory network to recognize multiple scenes in a single image by leveraging massive well-annotated single-scene images. The proposed network consists of three key components: 1) a prototype learning module, 2) a prototype-inhabiting external memory, and 3) a multi-head attention-based memory retrieval module. To be more specific, we first learn the prototype representation of each aerial scene from single-scene aerial image datasets and store it in an external memory. Afterwards, a multi-head attention-based memory retrieval module is devised to retrieve scene prototypes relevant to query multi-scene images for final predictions. Notably, only a limited number of annotated multi-scene images are needed in the training phase. To facilitate the progress of aerial scene recognition, we produce a new multi-scene aerial image (MAI) dataset. Experimental results on variant dataset configurations demonstrate the effectiveness of our network. Our dataset and codes are publicly available.
RESUMO
CNN-based Martian rock image processing has attracted much attention in Mars missions lately, since it can help planetary rover autonomously recognize and collect high value science targets. However, due to the difficulty of Martian rock image acquisition, the accuracy of the processing model is affected. In this paper, we introduce a new dataset called "GMSRI" that is a mixture of real Mars images and synthetic counterparts which are generated by GAN. GMSRI aims to provide a set of Martian rock images sorted by the texture and spatial structure of rocks. This paper offers a detailed analysis of GMSRI in its current state: Five sub-trees with 28 leaf nodes and 30,000 images in total. We show that GMSRI is much larger in scale and diversity than the current same kinds of datasets. Constructing such a database is a challenging task, and we describe the data collection, selection and generation processes carefully in this paper. Moreover, we evaluate the effectiveness of the GMSRI by an image super-resolution task. We hope that the scale, diversity and hierarchical structure of GMSRI can offer opportunities to researchers in the Mars exploration community and beyond.
Assuntos
Meio Ambiente Extraterreno , MarteRESUMO
The existing deep learning-based Personal Protective Equipment (PPE) detectors can only detect limited types of PPE and their performance needs to be improved, particularly for their deployment on real construction sites. This paper introduces an approach to train and evaluate eight deep learning detectors, for real application purposes, based on You Only Look Once (YOLO) architectures for six classes, including helmets with four colours, person, and vest. Meanwhile, a dedicated high-quality dataset, CHV, consisting of 1330 images, is constructed by considering real construction site background, different gestures, varied angles and distances, and multi PPE classes. The comparison result among the eight models shows that YOLO v5x has the best mAP (86.55%), and YOLO v5s has the fastest speed (52 FPS) on GPU. The detection accuracy of helmet classes on blurred faces decreases by 7%, while there is no effect on other person and vest classes. And the proposed detectors trained on the CHV dataset have a superior performance compared to other deep learning approaches on the same datasets. The novel multiclass CHV dataset is open for public use.
RESUMO
Optical coherence tomography angiography (OCTA) is a novel imaging modality that has been widely utilized in ophthalmology and neuroscience studies to observe retinal vessels and microvascular systems. However, publicly available OCTA datasets remain scarce. In this paper, we introduce the largest and most comprehensive OCTA dataset dubbed OCTA-500, which contains OCTA imaging under two fields of view (FOVs) from 500 subjects. The dataset provides rich images and annotations including two modalities (OCT/OCTA volumes), six types of projections, four types of text labels (age/gender/eye/disease) and seven types of segmentation labels (large vessel/capillary/artery/vein/2D FAZ/3D FAZ/retinal layers). Then, we propose a multi-object segmentation task called CAVF, which integrates capillary segmentation, artery segmentation, vein segmentation, and FAZ segmentation under a unified framework. In addition, we optimize the 3D-to-2D image projection network (IPN) to IPN-V2 to serve as one of the segmentation baselines. Experimental results demonstrate that IPN-V2 achieves an about 10% mIoU improvement over IPN on CAVF task. Finally, we further study the impact of several dataset characteristics: the training set size, the model input (OCT/OCTA, 3D volume/2D projection), the baseline networks, and the diseases. The dataset and code are publicly available at: https://ieee-dataport.org/open-access/octa-500.
Assuntos
Angiografia , Tomografia de Coerência Óptica , Humanos , Retina/diagnóstico por imagem , Vasos Retinianos/diagnóstico por imagemRESUMO
This article presents the chili and onion leaf (COLD) dataset, which focuses on the leaves of chili and onion plants, scientifically known as Allium cepa and capsicum. The presence of various diseases such as Purple blotch, Stemphylium leaf blight, Colletotrichum leaf blight, and Iris yellow spot virus in onions, as well as Cercospora leaf spot, powdery mildew, Murda complex syndrome, and nutrition deficiency in chili, have had a significant negative effect on onion and chili production. As a consequence, farmers have incurred financial losses. Computer vision and image-processing algorithms have been widely used in recent years for a range of applications, such as diagnosing and categorizing plant leaf diseases. In this paper we introduced a detailed chilli and onion leaf dataset gathered from Chilwadigi village with varying climatic conditions in Karnataka. The dataset contains a variety of chili and onion leaf categories carefully selected to tackle the complex challenges of categorizing leaf images taken in natural environments. Dealing with challenges such as subtle inter-class similarities, changes in lighting, and differences in background conditions like different foliage arrangements and varying light levels. We carefully documented chilli and onion leaves from various angles using high resolution camera to create a diverse and reliable dataset. The dataset on chilli leaves is set to be a valuable resource for enhancing computer vision algorithms, from traditional deep learning models to cutting-edge vision transformer architectures. This will help in creating advanced image recognition systems specifically designed for identifying chilli plants. By making this dataset publicly accessible, our goal is to empower researchers to develop new computer vision techniques to tackle the unique challenges of chilli and onion leaf recognition. You can access the dataset for free at the following DOI number: http://doi.org/10.17632/7nxxn4gj5s.3 and http://doi.org/10.17632/tf9dtfz9m6.3.
RESUMO
Tamil is one of the oldest existing languages, spoken by around 65 million people across India, Sri Lanka and South-East Asia. Countries such as Fiji and South Africa also have a significant population with Tamil ancestry. Tamil is a complex language and has 247 characters. A labelled dataset for Tamil Fingerspelling named TLFS23 has been created for research related to vision-based Fingerspelling translators for the Speech and hearing Impaired. The dataset would open up avenues to develop automated systems as translators and interpreters for effective communication between fingerspelling language users and non- users, using computer vision and deep learning algorithms. One thousand images representing each unique finger flexion motion for every Tamil character was collected overall constituting a large dataset with 248 classes with a total of 2,55,155 images. The images were contributed by 120 individuals from different age groups. The dataset is made publicly available at: https://data.mendeley.com/datasets/39kzs5pxmk/2.
RESUMO
Detecting emergency aircraft landing sites is crucial for ensuring passenger and crew safety during unexpected forced landings caused by factors like engine malfunctions, adverse weather, or other aviation emergencies. In this article, we present a dataset consisting of Google Maps images with their corresponding masks, specifically crafted with manual annotations of emergency aircraft landing sites, distinguishing between safe areas with suitable conditions for emergency landings and unsafe areas presenting hazardous conditions. Drawing on detailed guidelines from the Federal Aviation Administration, the annotations focus on key features such as slope, surface type, and obstacle presence, with the goal of pinpointing appropriate landing areas. The proposed dataset has 4180 images, with 2090 raw images accompanied by their corresponding annotation instances. This dataset employs a semantic segmentation approach, categorizing the image pixels into two "Safe" and "Unsafe" classes based on authenticated terrain-specific attributes, thereby offering a nuanced understanding of the viability of various landing sites in emergency scenarios.
RESUMO
BACKGROUND AND OBJECTIVE: The metabolic syndrome induced by obesity is closely associated with cardiovascular disease, and the prevalence is increasing globally, year by year. Obesity is a risk marker for detecting this disease. However, current research on computer-aided detection of adipose distribution is hampered by the lack of open-source large abdominal adipose datasets. METHODS: In this study, a benchmark Abdominal Adipose Tissue CT Image Dataset (AATCT-IDS) containing 300 subjects is prepared and published. AATCT-IDS publics 13,732 raw CT slices, and the researchers individually annotate the subcutaneous and visceral adipose tissue regions of 3213 of those slices that have the same slice distance to validate denoising methods, train semantic segmentation models, and study radiomics. For different tasks, this paper compares and analyzes the performance of various methods on AATCT-IDS by combining the visualization results and evaluation data. Thus, verify the research potential of this data set in the above three types of tasks. RESULTS: In the comparative study of image denoising, algorithms using a smoothing strategy suppress mixed noise at the expense of image details and obtain better evaluation data. Methods such as BM3D preserve the original image structure better, although the evaluation data are slightly lower. The results show significant differences among them. In the comparative study of semantic segmentation of abdominal adipose tissue, the segmentation results of adipose tissue by each model show different structural characteristics. Among them, BiSeNet obtains segmentation results only slightly inferior to U-Net with the shortest training time and effectively separates small and isolated adipose tissue. In addition, the radiomics study based on AATCT-IDS reveals three adipose distributions in the subject population. CONCLUSION: AATCT-IDS contains the ground truth of adipose tissue regions in abdominal CT slices. This open-source dataset can attract researchers to explore the multi-dimensional characteristics of abdominal adipose tissue and thus help physicians and patients in clinical practice. AATCT-IDS is freely published for non-commercial purpose at: https://figshare.com/articles/dataset/AATTCT-IDS/23807256.
Assuntos
Gordura Abdominal , Tomografia Computadorizada por Raios X , Humanos , Tomografia Computadorizada por Raios X/métodos , Gordura Abdominal/diagnóstico por imagem , Masculino , Feminino , Bases de Dados Factuais , Algoritmos , RadiômicaRESUMO
BACKGROUND: Endometrial cancer is one of the most common tumors in the female reproductive system and is the third most common gynecological malignancy that causes death after ovarian and cervical cancer. Early diagnosis can significantly improve the 5-year survival rate of patients. With the development of artificial intelligence, computer-assisted diagnosis plays an increasingly important role in improving the accuracy and objectivity of diagnosis and reducing the workload of doctors. However, the absence of publicly available image datasets restricts the application of computer-assisted diagnostic techniques. METHODS: In this paper, a publicly available Endometrial Cancer PET/CT Image Dataset for Evaluation of Semantic Segmentation and Detection of Hypermetabolic Regions (ECPC-IDS) are published. Specifically, the segmentation section includes PET and CT images, with 7159 images in multiple formats totally. In order to prove the effectiveness of segmentation on ECPC-IDS, six deep learning semantic segmentation methods are selected to test the image segmentation task. The object detection section also includes PET and CT images, with 3579 images and XML files with annotation information totally. Eight deep learning methods are selected for experiments on the detection task. RESULTS: This study is conduct using deep learning-based semantic segmentation and object detection methods to demonstrate the distinguishability on ECPC-IDS. From a separate perspective, the minimum and maximum values of Dice on PET images are 0.546 and 0.743, respectively. The minimum and maximum values of Dice on CT images are 0.012 and 0.510, respectively. The target detection section's maximum mAP values on PET and CT images are 0.993 and 0.986, respectively. CONCLUSION: As far as we know, this is the first publicly available dataset of endometrial cancer with a large number of multi-modality images. ECPC-IDS can assist researchers in exploring new algorithms to enhance computer-assisted diagnosis, benefiting both clinical doctors and patients. ECPC-IDS is also freely published for non-commercial at: https://figshare.com/articles/dataset/ECPC-IDS/23808258.