RESUMEN
Deep learning models for medical image segmentation are usually trained with voxel-wise losses, e.g., cross-entropy loss, focusing on unary supervision without considering inter-voxel relationships. This oversight potentially leads to semantically inconsistent predictions. Here, we propose a contextual similarity loss (CSL) and a structural similarity loss (SSL) to explicitly and efficiently incorporate inter-voxel relationships for improved performance. The CSL promotes consistency in predicted object categories for each image sub-region compared to ground truth. The SSL enforces compatibility between the predictions of voxel pairs by computing pair-wise distances between them, ensuring that voxels of the same class are close together whereas those from different classes are separated by a wide margin in the distribution space. The effectiveness of the CSL and SSL is evaluated using a clinical cone-beam computed tomography (CBCT) dataset of patients with various craniomaxillofacial (CMF) deformities and a public pancreas dataset. Experimental results show that the CSL and SSL outperform state-of-the-art regional loss functions in preserving segmentation semantics.
RESUMEN
In orthognathic surgical planning for patients with jaw deformities, it is crucial to accurately simulate the changes in facial appearance that follow the bony movement. Compared with the traditional biomechanics-based methods like the finite-element method (FEM), which are both labor-intensive and computationally inefficient, deep learning-based methods offer an efficient and robust modeling alternative. However, current methods do not account for the physical relationship between facial soft tissue and bony structure, causing them to fall short in accuracy compared to FEM. In this work, we propose an Attentive Correspondence assisted Movement Transformation network (ACMT-Net) to predict facial changes by correlating facial soft tissue changes with bony movement through a point-to-point attentive correspondence matrix. To ensure efficient training, we also introduce a contrastive loss for self-supervised pre-training of the ACMT-Net with a k-Nearest Neighbors (k-NN) based clustering. Experimental results on patients with jaw deformities show that our proposed solution can achieve significantly improved computational efficiency over the state-of-the-art FEM-based method with comparable facial change prediction accuracy.
Asunto(s)
Cara , Movimiento , Humanos , Cara/diagnóstico por imagen , Fenómenos Biomecánicos , Simulación por ComputadorRESUMEN
BACKGROUND AND OBJECTIVE: Computer-aided surgical simulation (CASS) can be used to virtually plan ideal outcomes of craniosynostosis surgery. Our purpose was to create a workflow analyzing the accuracy of surgical outcomes relative to virtually planned fronto-orbital advancement (FOA). METHODS: Patients who underwent FOA using CASS between October 1, 2017, and February 28, 2022, at our center and had postoperative computed tomography within 6 months of surgery were included. Virtual 3-dimensional (3D) models were created and coregistered using each patient's preoperative and postoperative computed tomography data. Three points on each bony segment were used to define the object in 3D space. Each planned bony segment was manipulated to match the actual postoperative outcome. The change in position of the 3D object was measured in translational (X, Y, Z) and rotational (roll, pitch, yaw) aspects to represent differences between planned and actual postoperative positions. The difference in the translational position of several bony landmarks was also recorded. Wilcoxon signed-rank tests were performed to measure significance of these differences from the ideal value of 0, which would indicate no difference between preoperative plan and postoperative outcome. RESULTS: Data for 63 bony segments were analyzed from 8 patients who met the inclusion criteria. Median differences between planned and actual outcomes of the segment groups ranged from -0.3 to -1.3 mm in the X plane; 1.4 to 5.6 mm in the Y plane; 0.9 to 2.7 mm in the Z plane; -1.2° to -4.5° in pitch; -0.1° to 1.0° in roll; and -2.8° to 1.0° in yaw. No significant difference from 0 was found in 21 of 24 segment region/side combinations. Translational differences of bony landmarks ranged from -2.7 to 3.6 mm. CONCLUSION: A high degree of accuracy was observed relative to the CASS plan. Virtual analysis of surgical accuracy in FOA using CASS was feasible.
Asunto(s)
Craneosinostosis , Cirugía Asistida por Computador , Humanos , Proyectos Piloto , Cirugía Asistida por Computador/métodos , Craneosinostosis/diagnóstico por imagen , Craneosinostosis/cirugía , Resultado del Tratamiento , ComputadoresRESUMEN
This paper proposes a deep learning framework to encode subject-specific transformations between facial and bony shapes for orthognathic surgical planning. Our framework involves a bidirectional point-to-point convolutional network (P2P-Conv) to predict the transformations between facial and bony shapes. P2P-Conv is an extension of the state-of-the-art P2P-Net and leverages dynamic point-wise convolution (i.e., PointConv) to capture local-to-global spatial information. Data augmentation is carried out in the training of P2P-Conv with multiple point subsets from the facial and bony shapes. During inference, network outputs generated for multiple point subsets are combined into a dense transformation. Finally, non-rigid registration using the coherent point drift (CPD) algorithm is applied to generate surface meshes based on the predicted point sets. Experimental results on real-subject data demonstrate that our method substantially improves the prediction of facial and bony shapes over state-of-the-art methods.
RESUMEN
Orthognathic surgery corrects jaw deformities to improve aesthetics and functions. Due to the complexity of the craniomaxillofacial (CMF) anatomy, orthognathic surgery requires precise surgical planning, which involves predicting postoperative changes in facial appearance. To this end, most conventional methods involve simulation with biomechanical modeling methods, which are labor intensive and computationally expensive. Here we introduce a learning-based framework to speed up the simulation of postoperative facial appearances. Specifically, we introduce a facial shape change prediction network (FSC-Net) to learn the nonlinear mapping from bony shape changes to facial shape changes. FSC-Net is a point transform network weakly-supervised by paired preoperative and postoperative data without point-wise correspondence. In FSC-Net, a distance-guided shape loss places more emphasis on the jaw region. A local point constraint loss restricts point displacements to preserve the topology and smoothness of the surface mesh after point transformation. Evaluation results indicate that FSC-Net achieves 15× speedup with accuracy comparable to a state-of-the-art (SOTA) finite-element modeling (FEM) method.
Asunto(s)
Aprendizaje Profundo , Cirugía Ortognática , Procedimientos Quirúrgicos Ortognáticos , Procedimientos Quirúrgicos Ortognáticos/métodos , Simulación por Computador , Cara/diagnóstico por imagen , Cara/cirugíaRESUMEN
Domain adaptation techniques have been demonstrated to be effective in addressing label deficiency challenges in medical image segmentation. However, conventional domain adaptation based approaches often concentrate on matching global marginal distributions between different domains in a class-agnostic fashion. In this paper, we present a dual-attention domain-adaptative segmentation network (DADASeg-Net) for cross-modality medical image segmentation. The key contribution of DADASeg-Net is a novel dual adversarial attention mechanism, which regularizes the domain adaptation module with two attention maps respectively from the space and class perspectives. Specifically, the spatial attention map guides the domain adaptation module to focus on regions that are challenging to align in adaptation. The class attention map encourages the domain adaptation module to capture class-specific instead of class-agnostic knowledge for distribution alignment. DADASeg-Net shows superior performance in two challenging medical image segmentation tasks.
Asunto(s)
Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , Procesamiento de Imagen Asistido por Computador/métodosRESUMEN
Cephalometric analysis relies on accurate detection of craniomaxillofacial (CMF) landmarks from cone-beam computed tomography (CBCT) images. However, due to the complexity of CMF bony structures, it is difficult to localize landmarks efficiently and accurately. In this paper, we propose a deep learning framework to tackle this challenge by jointly digitalizing 105 CMF landmarks on CBCT images. By explicitly learning the local geometrical relationships between the landmarks, our approach extends Mask R-CNN for end-to-end prediction of landmark locations. Specifically, we first apply a detection network on a down-sampled 3D image to leverage global contextual information to predict the approximate locations of the landmarks. We subsequently leverage local information provided by higher-resolution image patches to refine the landmark locations. On patients with varying non-syndromic jaw deformities, our method achieves an average detection accuracy of 1.38± 0.95mm, outperforming a related state-of-the-art method.
Asunto(s)
Tomografía Computarizada de Haz Cónico Espiral , Puntos Anatómicos de Referencia , Cefalometría/métodos , Tomografía Computarizada de Haz Cónico/métodos , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Imagenología Tridimensional/métodos , Reproducibilidad de los ResultadosRESUMEN
PURPOSE: A facial reference frame is a 3-dimensional Cartesian coordinate system that includes 3 perpendicular planes: midsagittal, axial, and coronal. The order in which one defines the planes matters. The purposes of this study are to determine the following: 1) what sequence (axial-midsagittal-coronal vs midsagittal-axial-coronal) produced more appropriate reference frames and 2) whether orbital or auricular dystopia influenced the outcomes. METHODS: This study is an ambispective cross-sectional study. Fifty-four subjects with facial asymmetry were included. The facial reference frames of each subject (outcome variable) were constructed using 2 methods (independent variable): axial plane first and midsagittal plane first. Two board-certified orthodontists together blindly evaluated the results using a 3-point categorical scale based on their careful inspection and expert intuition. The covariant for stratification was the existence of orbital or auricular dystopia. Finally, Wilcoxon signed rank tests were performed. RESULTS: The facial reference frames defined by the midsagittal plane first method was statistically significantly different from ones defined by the axial plane first method (P = .001). Using the midsagittal plane first method, the reference frames were more appropriately defined in 22 (40.7%) subjects, equivalent in 26 (48.1%) and less appropriately defined in 6 (11.1%). After stratified by orbital or auricular dystopia, the results also showed that the reference frame computed using midsagittal plane first method was statistically significantly more appropriate in both subject groups regardless of the existence of orbital or auricular dystopia (27 with orbital or auricular dystopia and 27 without, both P < .05). CONCLUSIONS: The midsagittal plane first sequence improves the facial reference frames compared with the traditional axial plane first approach. However, regardless of the sequence used, clinicians need to judge the correctness of the reference frame before diagnosis or surgical planning.
Asunto(s)
Puntos Anatómicos de Referencia , Imagenología Tridimensional , Computadores , Estudios Transversales , Asimetría Facial , Humanos , Imagenología Tridimensional/métodosRESUMEN
Skull segmentation from three-dimensional (3D) cone-beam computed tomography (CBCT) images is critical for the diagnosis and treatment planning of the patients with craniomaxillofacial (CMF) deformities. Convolutional neural network (CNN)-based methods are currently dominating volumetric image segmentation, but these methods suffer from the limited GPU memory and the large image size (e.g., 512 × 512 × 448). Typical ad-hoc strategies, such as down-sampling or patch cropping, will degrade segmentation accuracy due to insufficient capturing of local fine details or global contextual information. Other methods such as Global-Local Networks (GLNet) are focusing on the improvement of neural networks, aiming to combine the local details and the global contextual information in a GPU memory-efficient manner. However, all these methods are operating on regular grids, which are computationally inefficient for volumetric image segmentation. In this work, we propose a novel VoxelRend-based network (VR-U-Net) by combining a memory-efficient variant of 3D U-Net with a voxel-based rendering (VoxelRend) module that refines local details via voxel-based predictions on non-regular grids. Establishing on relatively coarse feature maps, the VoxelRend module achieves significant improvement of segmentation accuracy with a fraction of GPU memory consumption. We evaluate our proposed VR-U-Net in the skull segmentation task on a high-resolution CBCT dataset collected from local hospitals. Experimental results show that the proposed VR-U-Net yields high-quality segmentation results in a memory-efficient manner, highlighting the practical value of our method.
RESUMEN
Virtual orthognathic surgical planning involves simulating surgical corrections of jaw deformities on 3D facial bony shape models. Due to the lack of necessary guidance, the planning procedure is highly experience-dependent and the planning results are often suboptimal. A reference facial bony shape model representing normal anatomies can provide an objective guidance to improve planning accuracy. Therefore, we propose a self-supervised deep framework to automatically estimate reference facial bony shape models. Our framework is an end-to-end trainable network, consisting of a simulator and a corrector. In the training stage, the simulator maps jaw deformities of a patient bone to a normal bone to generate a simulated deformed bone. The corrector then restores the simulated deformed bone back to normal. In the inference stage, the trained corrector is applied to generate a patient-specific normal-looking reference bone from a real deformed bone. The proposed framework was evaluated using a clinical dataset and compared with a state-of-the-art method that is based on a supervised point-cloud network. Experimental results show that the estimated shape models given by our approach are clinically acceptable and significantly more accurate than that of the competing method.
RESUMEN
Dental landmark localization is a fundamental step to analyzing dental models in the planning of orthodontic or orthognathic surgery. However, current clinical practices require clinicians to manually digitize more than 60 landmarks on 3D dental models. Automatic methods to detect landmarks can release clinicians from the tedious labor of manual annotation and improve localization accuracy. Most existing landmark detection methods fail to capture local geometric contexts, causing large errors and misdetections. We propose an end-to-end learning framework to automatically localize 68 landmarks on high-resolution dental surfaces. Our network hierarchically extracts multi-scale local contextual features along two paths: a landmark localization path and a landmark area-of-interest segmentation path. Higher-level features are learned by combining local-to-global features from the two paths by feature fusion to predict the landmark heatmap and the landmark area segmentation map. An attention mechanism is then applied to the two maps to refine the landmark position. We evaluated our framework on a real-patient dataset consisting of 77 high-resolution dental surfaces. Our approach achieves an average localization error of 0.42 mm, significantly outperforming related start-of-the-art methods.
RESUMEN
Accurate bone segmentation and landmark detection are two essential preparation tasks in computer-aided surgical planning for patients with craniomaxillofacial (CMF) deformities. Surgeons typically have to complete the two tasks manually, spending ~12 hours for each set of CBCT or ~5 hours for CT. To tackle these problems, we propose a multi-stage coarse-to-fine CNN-based framework, called SkullEngine, for high-resolution segmentation and large-scale landmark detection through a collaborative, integrated, and scalable JSD model and three segmentation and landmark detection refinement models. We evaluated our framework on a clinical dataset consisting of 170 CBCT/CT images for the task of segmenting 2 bones (midface and mandible) and detecting 175 clinically common landmarks on bones, teeth, and soft tissues. Experimental results show that SkullEngine significantly improves segmentation quality, especially in regions where the bone is thin. In addition, SkullEngine also efficiently and accurately detect all of the 175 landmarks. Both tasks were completed simultaneously within 3 minutes regardless of CBCT or CT with high segmentation quality. Currently, SkullEngine has been integrated into a clinical workflow to further evaluate its clinical efficiency.
RESUMEN
Facial appearance changes with the movements of bony segments in orthognathic surgery of patients with craniomaxillofacial (CMF) deformities. Conventional bio-mechanical methods, such as finite element modeling (FEM), for simulating such changes, are labor intensive and computationally expensive, preventing them from being used in clinical settings. To overcome these limitations, we propose a deep learning framework to predict post-operative facial changes. Specifically, FC-Net, a facial appearance change simulation network, is developed to predict the point displacement vectors associated with a facial point cloud. FC-Net learns the point displacements of a pre-operative facial point cloud from the bony movement vectors between pre-operative and simulated post-operative bony models. FC-Net is a weakly-supervised point displacement network trained using paired data with strict point-to-point correspondence. To preserve the topology of the facial model during point transform, we employ a local-point-transform loss to constrain the local movements of points. Experimental results on real patient data reveal that the proposed framework can predict post-operative facial appearance changes remarkably faster than a state-of-the-art FEM method with comparable prediction accuracy.
RESUMEN
PURPOSE: The purpose of this study was to reduce the experience dependence during the orthognathic surgical planning that involves virtually simulating the corrective procedure for jaw deformities. METHODS: We introduce a geometric deep learning framework for generating reference facial bone shape models for objective guidance in surgical planning. First, we propose a surface deformation network to warp a patient's deformed bone to a set of normal bones for generating a dictionary of patient-specific normal bony shapes. Subsequently, sparse representation learning is employed to estimate a reference shape model based on the dictionary. RESULTS: We evaluated our method on a clinical dataset containing 24 patients, and compared it with a state-of-the-art method that relies on landmark-based sparse representation. Our method yields significantly higher accuracy than the competing method for estimating normal jaws and maintains the midfaces of patients' facial bones as well as the conventional way. CONCLUSIONS: Experimental results indicate that our method generates accurate shape models that meet clinical standards.
Asunto(s)
Anomalías Maxilomandibulares , Procedimientos Quirúrgicos Ortognáticos , Humanos , Imagenología Tridimensional , Maxilares , Aprendizaje Automático no SupervisadoRESUMEN
Automatic craniomaxillofacial (CMF) landmark localization from cone-beam computed tomography (CBCT) images is challenging, considering that 1) the number of landmarks in the images may change due to varying deformities and traumatic defects, and 2) the CBCT images used in clinical practice are typically large. In this paper, we propose a two-stage, coarse-to-fine deep learning method to tackle these challenges with both speed and accuracy in mind. Specifically, we first use a 3D faster R-CNN to roughly locate landmarks in down-sampled CBCT images that have varying numbers of landmarks. By converting the landmark point detection problem to a generic object detection problem, our 3D faster R-CNN is formulated to detect virtual, fixed-size objects in small boxes with centers indicating the approximate locations of the landmarks. Based on the rough landmark locations, we then crop 3D patches from the high-resolution images and send them to a multi-scale UNet for the regression of heatmaps, from which the refined landmark locations are finally derived. We evaluated the proposed approach by detecting up to 18 landmarks on a real clinical dataset of CMF CBCT images with various conditions. Experiments show that our approach achieves state-of-the-art accuracy of 0.89 ± 0.64mm in an average time of 26.2 seconds per volume.
Asunto(s)
Tomografía Computarizada de Haz Cónico , Imagenología TridimensionalRESUMEN
Accurate prediction of facial soft-tissue changes following orthognathic surgery is crucial for surgical outcome improvement. We developed a novel incremental simulation approach using finite element method (FEM) with a realistic lip sliding effect to improve the prediction accuracy in the lip region. First, a lip-detailed mesh is generated based on accurately digitized lip surface points. Second, an improved facial soft-tissue change simulation method is developed by applying a lip sliding effect along with the mucosa sliding effect. Finally, the orthognathic surgery initiated soft-tissue change is simulated incrementally to facilitate a natural transition of the facial change and improve the effectiveness of the sliding effects. Our method was quantitatively validated using 35 retrospective clinical data sets by comparing it to the traditional FEM simulation method and the FEM simulation method with mucosa sliding effect only. The surface deviation error of our method showed significant improvement in the upper and lower lips over the other two prior methods. In addition, the evaluation results using our lip-shape analysis, which reflects clinician's qualitative evaluation, also proved significant improvement of the lip prediction accuracy of our method for the lower lip and both upper and lower lips as a whole compared to the other two methods. In conclusion, the prediction accuracy in the clinically critical region, i.e., the lips, significantly improved after applying incremental simulation with realistic lip sliding effect compared with the FEM simulation methods without the lip sliding effect.
Asunto(s)
Labio , Cirugía Ortognática , Cefalometría , Humanos , Labio/cirugía , Mandíbula , Maxilar , Estudios RetrospectivosRESUMEN
The dearth of annotated data is a major hurdle in building reliable image segmentation models. Manual annotation of medical images is tedious, time-consuming, and significantly variable across imaging modalities. The need for annotation can be ameliorated by leveraging an annotation-rich source modality in learning a segmentation model for an annotation-poor target modality. In this paper, we introduce a diverse data augmentation generative adversarial network (DDA-GAN) to train a segmentation model for an unannotated target image domain by borrowing information from an annotated source image domain. This is achieved by generating diverse augmented data for the target domain by one-to-many source-to-target translation. The DDA-GAN uses unpaired images from the source and target domains and is an end-to-end convolutional neural network that (i) explicitly disentangles domain-invariant structural features related to segmentation from domain-specific appearance features, (ii) combines structural features from the source domain with appearance features randomly sampled from the target domain for data augmentation, and (iii) train the segmentation model with the augmented data in the target domain and the annotations from the source domain. The effectiveness of our method is demonstrated both qualitatively and quantitatively in comparison with the state of the art for segmentation of craniomaxillofacial bony structures via MRI and cardiac substructures via CT.
Asunto(s)
Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , Humanos , Imagen por Resonancia MagnéticaRESUMEN
Orthognathic surgical outcomes rely heavily on the quality of surgical planning. Automatic estimation of a reference facial bone shape significantly reduces experience-dependent variability and improves planning accuracy and efficiency. We propose an end-to-end deep learning framework to estimate patient-specific reference bony shape models for patients with orthognathic deformities. Specifically, we apply a point-cloud network to learn a vertex-wise deformation field from a patient's deformed bony shape, represented as a point cloud. The estimated deformation field is then used to correct the deformed bony shape to output a patient-specific reference bony surface model. To train our network effectively, we introduce a simulation strategy to synthesize deformed bones from any given normal bone, producing a relatively large and diverse dataset of shapes for training. Our method was evaluated using both synthetic and real patient data. Experimental results show that our framework estimates realistic reference bony shape models for patients with varying deformities. The performance of our method is consistently better than an existing method and several deep point-cloud networks. Our end-to-end estimation framework based on geometric deep learning shows great potential for improving clinical workflows.
Asunto(s)
Aprendizaje Profundo , Procedimientos Quirúrgicos Ortognáticos , Huesos , HumanosRESUMEN
PURPOSE: Our current understanding of unilateral condylar hyperplasia (UCH) was put forth by Obwegeser. He hypothesized that UCH is 2 separate conditions: hemimandibular hyperplasia and hemimandibular elongation. This hypothesis was based on the following 3 assumptions: 1) the direction of overgrowth, in UCH, is bimodal-vertical or horizontal, with rare cases growing obliquely; 2) UCH can expand a hemimandible with and without significant condylar enlargement; and 3) there is an association between the condylar expansion and the direction of overgrowth-minimal expansion resulting in horizontal growth and significant enlargement causing vertical displacement. The purpose of this study was to test these assumptions. PATIENTS AND METHODS: We analyzed the computed tomography scans of 40 patients with UCH. First, we used a Silverman Cluster analysis to determine how the direction of overgrowth is distributed in the UCH population. Next, we evaluated the relationship between hemimandibular overgrowth and condylar enlargement to confirm that overgrowth can occur independently of condylar expansion. Finally, we assessed the relationship between the degree of condylar enlargement and the direction of overgrowth to ascertain if condylar expansion determines the direction of growth. RESULTS: Our first investigation demonstrates that the general impression that UCH is bimodal is wrong. The growth vectors in UCH are unimodally distributed, with the vast majority of cases growing diagonally. Our second investigation confirms the observation that UCH can expand a hemimandible with and without significant condylar enlargement. Our last investigation determined that in UCH, there is no association between the degree of condylar expansion and the direction of the overgrowth. CONCLUSIONS: The results of this study disprove the idea that UCH is 2 different conditions: hemimandibular hyperplasia and hemimandibular elongation. It also provides new insights about the pathophysiology of UCH.
Asunto(s)
Asimetría Facial , Cóndilo Mandibular , Asimetría Facial/diagnóstico por imagen , Asimetría Facial/etiología , Asimetría Facial/patología , Humanos , Hiperplasia , Hipertrofia/patología , Masculino , Mandíbula/diagnóstico por imagen , Mandíbula/patología , Cóndilo Mandibular/diagnóstico por imagen , Cóndilo Mandibular/patologíaRESUMEN
OBJECTIVE: To estimate a patient-specific reference bone shape model for a patient with craniomaxillofacial (CMF) defects due to facial trauma. METHODS: We proposed an automatic facial bone shape estimation framework using pre-traumatic conventional portrait photos and post-traumatic head computed tomography (CT) scans via a 3D face reconstruction and a deformable shape model. Specifically, a three-dimensional (3D) face was first reconstructed from the patient's pre-traumatic portrait photos. Second, a correlation model between the skin and bone surfaces was constructed using a sparse representation based on the CT images of training normal subjects. Third, by feeding the reconstructed 3D face into the correlation model, an initial reference shape model was generated. In addition, we refined the initial estimation by applying non-rigid surface matching between the initially estimated shape and the patient's post-traumatic bone based on the adaptive-focus deformable shape model (AFDSM). Furthermore, a statistical shape model, built from the training normal subjects, was utilized to constrain the deformation process to avoid overfitting. RESULTS AND CONCLUSION: The proposed method was evaluated using both synthetic and real patient data. Experimental results show that the patient's abnormal facial bony structure can be recovered using our method, and the estimated reference shape model is considered clinically acceptable by an experienced CMF surgeon. SIGNIFICANCE: The proposed method is more suitable to the complex CMF defects for CMF reconstructive surgical planning.