RESUMO
The morphological analysis and volume measurement of the hippocampus are crucial to the study of many brain diseases. Therefore, an accurate hippocampal segmentation method is beneficial for the development of clinical research in brain diseases. U-Net and its variants have become prevalent in hippocampus segmentation of Magnetic Resonance Imaging (MRI) due to their effectiveness, and the architecture based on Transformer has also received some attention. However, some existing methods focus too much on the shape and volume of the hippocampus rather than its spatial information, and the extracted information is independent of each other, ignoring the correlation between local and global features. In addition, many methods cannot be effectively applied to practical medical image segmentation due to many parameters and high computational complexity. To this end, we combined the advantages of CNNs and ViTs (Vision Transformer) and proposed a simple and lightweight model: Light3DHS for the segmentation of the 3D hippocampus. In order to obtain richer local contextual features, the encoder first utilizes a multi-scale convolutional attention module (MCA) to learn the spatial information of the hippocampus. Considering the importance of local features and global semantics for 3D segmentation, we used a lightweight ViT to learn high-level features of scale invariance and further fuse local-to-global representation. To evaluate the effectiveness of encoder feature representation, we designed three decoders of different complexity to generate segmentation maps. Experiments on three common hippocampal datasets demonstrate that the network achieves more accurate hippocampus segmentation with fewer parameters. Light3DHS performs better than other state-of-the-art algorithms.
Assuntos
Hipocampo , Imageamento Tridimensional , Imageamento por Ressonância Magnética , Hipocampo/diagnóstico por imagem , Humanos , Imageamento por Ressonância Magnética/métodos , Imageamento Tridimensional/métodos , Redes Neurais de Computação , Aprendizado Profundo , AlgoritmosRESUMO
Maneuver of conducting polymers (CPs) into lightweight hydrogels can improve their functional performances in energy devices, chemical sensing, pollutant removal, drug delivery, etc. Current approaches for the manipulation of CP hydrogels are limited, and they are mostly accompanied by harsh conditions, tedious processing, compositing with other constituents, or using unusual chemicals. Herein, a two-step route is introduced for the controllable fabrication of CP hydrogels in ambient conditions, where gelation of the shape-anisotropic nano-oxidants followed by in-situ oxidative polymerization leads to the formation of polyaniline (PANI) and polypyrrole hydrogels. The method is readily coupled with different approaches for materials processing of PANI hydrogels into varied shapes, including spherical beads, continuous wires, patterned films, and free-standing objects. In comparison with their bulky counterparts, lightweight PANI items exhibit improved properties when those with specific shapes are used as electrodes for supercapacitors, gas sensors, or dye adsorbents. The current study therefore provides a general and controllable approach for the implementation of CP into hydrogels of varied external shapes, which can pave the way for the integration of lightweight CP structures with emerging functional devices.
RESUMO
Porous composites are important in engineering fields for their lightweight, thermal insulation, and mechanical properties. However, increased porosity commonly decreases the robustness, making a trade-off between mechanics and weight. Optimizing the strength of solid structure is a promising way to co-enhance the robustness and lightweight properties. Here, acrylamide and calcium phosphate ionic oligomers are copolymerized, revealing a pre-interaction of these precursors induced oriented crystallization of inorganic nanostructures during the linear polymerization of acrylamide, leading to the spontaneous formation of a bone-like nanostructure. The resulting solid phase shows enhanced mechanics, surpassing most biological materials. The bone-like nanostructure remains intact despite the introduction of porous structures at higher levels, resulting in a porous composite (P-APC) with high strength (yield strength of 10.5 MPa) and lightweight properties (density below 0.22 g cm-3). Notably, the density-strength property surpasses most reported porous materials. Additionally, P-APC shows ultralow thermal conductivity (45 mW m-1 k-1) due to its porous structure, making its strength and thermal insulation superior to many reported materials. This work provides a robust, lightweight, and thermal insulating composite for practical application. It emphasizes the advantage of prefunctionalization of ionic oligomers for organic-inorganic copolymerization in creating oriented nanostructure with toughened mechanics, offering an alternative strategy to produce robust lightweight materials.
RESUMO
Developing lightweight, high-performance electromagnetic wave (EMW) absorbing materials those can absorb the adverse electromagnetic radiation or waves are of great significance. Transition metal carbides and/or nitrides (MXenes) are a novel type of 2D nanosheets associated with a large aspect ratio, abundant polar functional groups, adjustable conductivity, and remarkable mechanical properties. This contributes to the high-efficiency assembly of MXene-based aerogels possessing the ultra-low density, large specific surface area, tunable conductivity, and unique 3D porous microstructure, which is beneficial for promoting the EMW absorption. Therefore, MXene-based aerogels for EMW absorption have attracted widespread attention. This review provides an overview of the research progress on MXene-based aerogels for EMW absorption, focusing on the recent advances in component and structure design strategies, and summarizes the main strategies for constructing EMW absorbing MXene-based aerogels. In addition, based on EMW absorption mechanisms and structure regulation strategies, the preparation methods and properties of MXene-based aerogels with varieties of components and pore structures are detailed to advance understanding the relationships of composition-structure-performance. Furthermore, the future development and challenges faced by MXene-based aerogels for EMW absorption are summarized and prospected.
RESUMO
There is a growing demand for thermal management materials in electronic fields. Aerogels have attracted interest due to their extremely low density and extraordinary thermal insulation properties. However, the application of aerogels is limited by high production costs and the requirement that aerogel structures not be load-bearing. In this study, mullite-reinforced SiC-based aerogel composite (MR-SiC AC) is prepared through 3D printing combined with in situ growth of SiC nanowires in post processing. The fabricated MR-SiC AC not only has ultra-low thermal conductivity (0.021 W K m-1) and high porosity (90.0%), but also a high Young's modulus (24.4 MPa) and high compressive strength (1.65 MPa), both exceeding the measurements of existing resilient aerogels by an order of magnitude. These properties make MR-SiC AC an ideal solution for the precision thermal management of lightweight structures having complex geometry for functional devices.
RESUMO
Aqueous flow batteries (AFBs) are promising long-duration energy storage system owing to intrinsic safety, inherent scalability, and ultralong cycle life. However, due to the thicker (3-5 mm) and heavier (300-600 g m-2) nature, the current used graphite felt (GF) electrodes still limit the volume/weight power density of AFBs. Herein, a lightweight (≈50 g m-2) and ultrathin (≈0.3 mm) carbon microtube electrode (CME) is proposed derived from a scalable one-step carbonization of commercial cotton cloth. The unique loose woven structure composed of carbon microtube endows CME with excellent conductivity, abundant active sites, and enhanced electrolyte transport performance, thereby significantly reducing polarization in working AFBs. As a consequence, CME demonstrates excellent cycling performance in pH-universal AFBs, including acidic vanadium flow battery (maximum power density of 632.2 mW cm-2), neutral Zn-I2 flow battery (750 cycles with average Coulombic efficiency of 99.6%), and alkaline Zn-Fe flow battery (energy efficiency over 70% at 200 mA cm-2). More importantly, the estimated price of CME is only 5% of GF (≈3 vs ≈60 $ m-2). Therefore, it is reasonably anticipated that the lightweight and ultrathin CME may emerge as the next generation electrode for AFBs.
RESUMO
Multi-drug combinations for the treatment of complex diseases are gradually becoming an important treatment, and this type of treatment can take advantage of the synergistic effects among drugs. However, drug-drug interactions (DDIs) are not just all beneficial. Accurate and rapid identifications of the DDIs are essential to enhance the effectiveness of combination therapy and avoid unintended side effects. Traditional DDIs prediction methods use only drug sequence information or drug graph information, which ignores information about the position of atoms and edges in the spatial structure. In this paper, we propose Molormer, a method based on a lightweight attention mechanism for DDIs prediction. Molormer takes the two-dimension (2D) structures of drugs as input and encodes the molecular graph with spatial information. Besides, Molormer uses lightweight-based attention mechanism and self-attention distilling to process spatially the encoded molecular graph, which not only retains the multi-headed attention mechanism but also reduces the computational and storage costs. Finally, we use the Siamese network architecture to serve as the architecture of Molormer, which can make full use of the limited data to train the model for better performance and also limit the differences to some extent between networks dealing with drug features. Experiments show that our proposed method outperforms state-of-the-art methods in Accuracy, Precision, Recall and F1 on multi-label DDIs dataset. In the case study section, we used Molormer to make predictions of new interactions for the drugs Aliskiren, Selexipag and Vorapaxar and validated parts of the predictions. Code and models are available at https://github.com/IsXudongZhang/Molormer.
Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Interações Medicamentosas , HumanosRESUMO
BACKGROUND: Spine MR image segmentation is important foundation for computer-aided diagnostic (CAD) algorithms of spine disorders. Convolutional neural networks segment effectively, but require high computational costs. PURPOSE: To design a lightweight model based on dynamic level-set loss function for high segmentation performance. STUDY TYPE: Retrospective. POPULATION: Four hundred forty-eight subjects (3163 images) from two separate datasets. Dataset-1: 276 subjects/994 images (53.26% female, mean age 49.02 ± 14.09), all for disc degeneration screening, 188 had disc degeneration, 67 had herniated disc. Dataset-2: public dataset with 172 subjects/2169 images, 142 patients with vertebral degeneration, 163 patients with disc degeneration. FIELD STRENGTH/SEQUENCE: T2 weighted turbo spin echo sequences at 3T. ASSESSMENT: Dynamic Level-set Net (DLS-Net) was compared with four mainstream (including U-net++) and four lightweight models, and manual label made by five radiologists (vertebrae, discs, spinal fluid) used as segmentation evaluation standard. Five-fold cross-validation are used for all experiments. Based on segmentation, a CAD algorithm of lumbar disc was designed for assessing DLS-Net's practicality, and the text annotation (normal, bulging, or herniated) from medical history data were used as evaluation standard. STATISTICAL TESTS: All segmentation models were evaluated with DSC, accuracy, precision, and AUC. The pixel numbers of segmented results were compared with manual label using paired t-tests, with P < 0.05 indicating significance. The CAD algorithm was evaluated with accuracy of lumbar disc diagnosis. RESULTS: With only 1.48% parameters of U-net++, DLS-Net achieved similar accuracy in both datasets (Dataset-1: DSC 0.88 vs. 0.89, AUC 0.94 vs. 0.94; Dataset-2: DSC 0.86 vs. 0.86, AUC 0.93 vs. 0.93). The segmentation results of DLS-Net showed no significant differences with manual labels in pixel numbers for discs (Dataset-1: 1603.30 vs. 1588.77, P = 0.22; Dataset-2: 863.61 vs. 886.4, P = 0.14) and vertebrae (Dataset-1: 3984.28 vs. 3961.94, P = 0.38; Dataset-2: 4806.91 vs. 4732.85, P = 0.21). Based on DLS-Net's segmentation results, the CAD algorithm achieved higher accuracy than using non-cropped MR images (87.47% vs. 61.82%). DATA CONCLUSION: The proposed DLS-Net has fewer parameters but achieves similar accuracy to U-net++, helps CAD algorithm achieve higher accuracy, which facilitates wider application. EVIDENCE LEVEL: 2 TECHNICAL EFFICACY: Stage 1.
Assuntos
Processamento de Imagem Assistida por Computador , Degeneração do Disco Intervertebral , Humanos , Feminino , Adulto , Pessoa de Meia-Idade , Masculino , Processamento de Imagem Assistida por Computador/métodos , Estudos Retrospectivos , Degeneração do Disco Intervertebral/diagnóstico por imagem , Redes Neurais de Computação , Coluna Vertebral/diagnóstico por imagemRESUMO
The fracture behaviour of concrete is studied in various micro- and macro-damage models. This is important for estimating serviceability and stability of concrete structures. However, a detailed understanding of the material behaviour under load is often not available. In order to better interpret the fracture behaviour and pattern, images of lightweight concrete were taken using a high-resolution computed tomography (µ-CT) scanner. The samples were loaded between the taken images and the load was kept constant during the measurement. This study describes the method used and how the data set was analysed to investigate displacements and cracks. It has been shown that displacements and damage to the concrete structure can be detected prior to failure, allowing conclusions to be drawn about the structural behaviour. In principle, the µ-CT measurement can be used to examine different kinds of concrete as well as other systems with inorganic binders and to compare the fracture behaviour of different systems.
RESUMO
BACKGROUND: Acute biliary pancreatitis (ABP) is a clinical common acute abdomen. After the first pancreatitis, relapse rate is high, which seriously affects human life and health and causes great economic burdens to family and society. According to a great many research findings, endoscopic retrograde cholangiopancreatography (ERCP) is an effective treatment method. However, whether ERCP should be performed in early stage of ABP is still controversial in clinical practice. METHODS: Related articles were retrieved from Pubmed, Web of Science core library, Nature, Science Direct, and other databases published from January 2000 until now. The keywords included early ERCP, delayed ERCP, ABP, laparoscopy, and cholecystectomy, all which were connected by "or" and "and". The language of articles was not restricted during the retrieval and Review Manager5.3 was employed to perform meta-analysis of experimental data. Finally, a total of 8 eligible articles were selected, including 8,801 patients. RESULTS: The results of the meta-analysis demonstrated that no remarkable differences were detected in the incidence of complications, mortality, and operation time between patients undergoing ERCP in early stage and those receiving delayed ERCP. However, the hospitalization time of patients in experimental group was notably shorter than that among patients in control group. CONCLUSINS: Early ERCP treatment is as safe as late ERCP treatment for biliary pancreatitis, and can significantly shorten the hospital stay. Hence, the therapy was worthy of clinical promotion. The research findings provided reference and basis for clinical treatment of relevant diseases.
Assuntos
Colangiopancreatografia Retrógrada Endoscópica , Aprendizado Profundo , Pancreatite , Humanos , Colangiopancreatografia Retrógrada Endoscópica/métodos , Pancreatite/cirurgia , Pancreatite/terapia , Pancreatite/complicações , Doença Aguda , Tempo de Internação/estatística & dados numéricos , Resultado do Tratamento , Duração da Cirurgia , Tempo para o TratamentoRESUMO
OBJECTIVE: To evaluate the performance of two lightweight neural network models in the diagnosis of common fundus diseases and make comparison to another two classical models. METHODS: A total of 16,000 color fundus photography were collected, including 2000 each of glaucoma, diabetic retinopathy (DR), high myopia, central retinal vein occlusion (CRVO), age-related macular degeneration (AMD), optic neuropathy, and central serous chorioretinopathy (CSC), in addition to 2000 normal fundus. Fundus photography was obtained from patients or physical examiners who visited the Ophthalmology Department of Beijing Tongren Hospital, Capital Medical University. Each fundus photography has been diagnosed and labeled by two professional ophthalmologists. Two classical classification models (ResNet152 and DenseNet121), and two lightweight classification models (MobileNetV3 and ShufflenetV2), were trained. Area under the curve (AUC), sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were used to evaluate the performance of the four models. RESULTS: Compared with the classical classification model, the total size and number of parameters of the two lightweight classification models were significantly reduced, and the classification speed was sharply improved. Compared with the DenseNet121 model, the ShufflenetV2 model took 50.7% less time to make a diagnosis on a fundus photography. The classical models performed better than lightweight classification models, and Densenet121 showed highest AUC in five out of the seven common fundus diseases. However, the performance of lightweight classification models is satisfying. The AUCs using MobileNetV3 model to diagnose AMD, diabetic retinopathy, glaucoma, CRVO, high myopia, optic atrophy, and CSC were 0.805, 0.892, 0.866, 0.812, 0.887, 0.868, and 0.803, respectively. For ShufflenetV2model, the AUCs for the above seven diseases were 0.856, 0.893, 0.855, 0.884, 0.891, 0.867, and 0.844, respectively. CONCLUSION: The training of light-weight neural network models based on color fundus photography for the diagnosis of common fundus diseases is not only fast but also has a significant reduction in storage size and parameter number compared with the classical classification model, and can achieve satisfactory accuracy.
Assuntos
Retinopatia Diabética , Glaucoma , Degeneração Macular , Miopia , Humanos , Retinopatia Diabética/diagnóstico , Técnicas de Diagnóstico Oftalmológico , Fundo de Olho , Glaucoma/diagnóstico , Degeneração Macular/diagnóstico , FotografaçãoRESUMO
OBJECTIVE: Alzheimer's disease (AD) is a type of neurological illness that significantly impacts individuals' daily lives. In the intelligent diagnosis of AD, 3D networks require larger computational resources and storage space for training the models, leading to increased model complexity and training time. On the other hand, 2D slices analysis may overlook the 3D structural information of MRI and can result in information loss. APPROACH: We propose a multi-slice attention fusion and multi-view personalized fusion lightweight network for automated AD diagnosis. It incorporates a multi-branch lightweight backbone to extract features from sagittal, axial, and coronal view of MRI, respectively. In addition, we introduce a novel multi-slice attention fusion module, which utilizes a combination of global and local channel attention mechanism to ensure consistent classification across multiple slices. Additionally, a multi-view personalized fusion module is tailored to assign appropriate weights to the three views, taking into account the varying significance of each view in achieving accurate classification results. To enhance the performance of the multi-view personalized fusion module, we utilize a label consistency loss to guide the model's learning process. This encourages the acquisition of more consistent and stable representations across all three views. MAIN RESULTS: The suggested strategy is efficient in lowering the number of parameters and FLOPs, with only 3.75M and 4.45G respectively, and accuracy improved by 10.5% to 14% in three tasks. Moreover, in the classification tasks of AD vs. CN, AD vs. MCI and MCI vs. CN, the accuracy of the proposed method is 95.63%, 86.88% and 85.00%, respectively, which is superior to the existing methods. CONCLUSIONS: The results show that the proposed approach not only excels in resource utilization, but also significantly outperforms the four comparison methods in terms of accuracy and sensitivity, particularly in detecting early-stage AD lesions. It can precisely capture and accurately identify subtle brain lesions, providing crucial technical support for early intervention and treatment.
Assuntos
Doença de Alzheimer , Imageamento por Ressonância Magnética , Doença de Alzheimer/diagnóstico por imagem , Humanos , Imageamento por Ressonância Magnética/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional , Idoso , Redes Neurais de ComputaçãoRESUMO
BACKGROUND: Due to the declining mortality rates of breast carcinoma and the rising incidence of risk-reducing mastectomies, enhancing the quality of life after breast reconstructions has become an increasingly important goal. The advantages of lightweight breast implants (B-Lite®) may significantly contribute to achieving this objective. This study aims to investigate whether lightweight implants are suitable for patients undergoing breast reconstruction and could improve the quality of life in comparison to conventional implants. METHODS: In this study, we retrospectively analyzed 48 patients (38 implants in each group) who underwent implant-based breast reconstruction with either B-Lite® or conventional breast implants between 2019 and 2022 at the University Center for Plastic Surgery in Regensburg. As part of the postoperative follow-up, a clinical examination and a survey using the Breast-Q® questionnaire were conducted to evaluate the postoperative quality of life. RESULTS: The implants used were similar in weight and shape. On average, the B-Lite® implants had a higher implant volume and patients in this group had a slightly higher BMI. Patients who received B-Lite® implants showed a significantly better result regarding the sensation of sensitivity in the surgical area and the scar formation also appeared to be more favorable. However, patients with B-Lite® implants perceived their implants as more uncomfortable than those with conventional breast implants. In other terms concerning quality of life, both groups appeared similar. CONCLUSION: In summary, there are confounding factors that could influence the outcome of some aspects in this study, which could not be avoided due to the retrospective study design and the temporary suspension of B-Lite implants. Nevertheless, as the first of its kind, this study demonstrated that B-Lite implants could also be suitable for usage in breast reconstructions, thus providing an important foundation for further prospective studies to build upon.
Assuntos
Implante Mamário , Implantes de Mama , Mamoplastia , Qualidade de Vida , Humanos , Feminino , Estudos Retrospectivos , Pessoa de Meia-Idade , Adulto , Implante Mamário/instrumentação , Mamoplastia/psicologia , Neoplasias da Mama/cirurgia , Neoplasias da Mama/psicologia , Inquéritos e Questionários , Satisfação do Paciente , Desenho de PróteseRESUMO
To address the lightweight and real-time issues of coal sorting detection, an intelligent detection method for coal and gangue, Our-v8, was proposed based on improved YOLOv8. Images of coal and gangue with different densities under two diverse lighting environments were collected. Then the Laplacian image enhancement algorithm was proposed to improve the training data quality, sharpening contours and boosting feature extraction; the CBAM attention mechanism was introduced to prioritize crucial features, enhancing more accurate feature extraction ability; and the EIOU loss function was added to refine box regression, further improving detection accuracy. The experimental results showed that Our-v8 for detecting coal and gangue in a halogen lamp lighting environment achieved excellent performance with a mean average precision (mAP) of 99.5%, was lightweight with FLOPs of 29.7, Param of 12.8, and a size of only 22.1 MB. Additionally, Our-v8 can provide accurate location information for coal and gangue, making it ideal for real-time coal sorting applications.
RESUMO
Aiming at the problems of a large volume, slow processing speed, and difficult deployment in the edge terminal, this paper proposes a lightweight insulator detection algorithm based on an improved SSD. Firstly, the original feature extraction network VGG-16 is replaced by a lightweight Ghost Module network to initially achieve the lightweight model. A Feature Pyramid structure and Feature Pyramid Network (FPN+PAN) are integrated into the Neck part and a Simplified Spatial Pyramid Pooling Fast (SimSPPF) module is introduced to realize the integration of local features and global features. Secondly, multiple Spatial and Channel Squeeze-and-Excitation (scSE) attention mechanisms are introduced in the Neck part to make the model pay more attention to the channels containing important feature information. The original six detection heads are reduced to four to improve the inference speed of the network. In order to improve the recognition performance of occluded and overlapping targets, DIoU-NMS was used to replace the original non-maximum suppression (NMS). Furthermore, the channel pruning strategy is used to reduce the unimportant weight matrix of the model, and the knowledge distillation strategy is used to fine-adjust the network model after pruning, so as to ensure the detection accuracy. The experimental results show that the parameter number of the proposed model is reduced from 26.15 M to 0.61 M, the computational load is reduced from 118.95 G to 1.49 G, and the mAP is increased from 96.8% to 98%. Compared with other models, the proposed model not only guarantees the detection accuracy of the algorithm, but also greatly reduces the model volume, which provides support for the realization of visible light insulator target detection based on edge intelligence.
RESUMO
Facial expression recognition using convolutional neural networks (CNNs) is a prevalent research area, and the network's complexity poses obstacles for deployment on devices with limited computational resources, such as mobile devices. To address these challenges, researchers have developed lightweight networks with the aim of reducing model size and minimizing parameters without compromising accuracy. The LiteFer method introduced in this study incorporates depth-separable convolution and a lightweight attention mechanism, effectively reducing network parameters. Moreover, through comprehensive comparative experiments on the RAFDB and FERPlus datasets, its superior performance over various state-of-the-art lightweight expression-recognition methods is evident.
Assuntos
Redes Neurais de Computação , Humanos , Algoritmos , Expressão Facial , Reconhecimento Automatizado de Padrão/métodosRESUMO
As an important direction in computer vision, human pose estimation has received extensive attention in recent years. A High-Resolution Network (HRNet) can achieve effective estimation results as a classical human pose estimation method. However, the complex structure of the model is not conducive to deployment under limited computer resources. Therefore, an improved Efficient and Lightweight HRNet (EL-HRNet) model is proposed. In detail, point-wise and grouped convolutions were used to construct a lightweight residual module, replacing the original 3 × 3 module to reduce the parameters. To compensate for the information loss caused by the network's lightweight nature, the Convolutional Block Attention Module (CBAM) is introduced after the new lightweight residual module to construct the Lightweight Attention Basicblock (LA-Basicblock) module to achieve high-precision human pose estimation. To verify the effectiveness of the proposed EL-HRNet, experiments were carried out using the COCO2017 and MPII datasets. The experimental results show that the EL-HRNet model requires only 5 million parameters and 2.0 GFlops calculations and achieves an AP score of 67.1% on the COCO2017 validation set. In addition, PCKh@0.5mean is 87.7% on the MPII validation set, and EL-HRNet shows a good balance between model complexity and human pose estimation accuracy.
RESUMO
X-ray images typically contain complex background information and abundant small objects, posing significant challenges for object detection in security tasks. Most existing object detection methods rely on complex networks and high computational costs, which poses a challenge to implement lightweight models. This article proposes Fine-YOLO to achieve rapid and accurate detection in the security domain. First, a low-parameter feature aggregation (LPFA) structure is designed for the backbone feature network of YOLOv7 to enhance its ability to learn more information with a lighter structure. Second, a high-density feature aggregation (HDFA) structure is proposed to solve the problem of loss of local details and deep location information caused by the necked feature fusion network in YOLOv7-Tiny-SiLU, connecting cross-level features through max-pooling. Third, the Normalized Wasserstein Distance (NWD) method is employed to alleviate the convergence complexity resulting from the extreme sensitivity of bounding box regression to small objects. The proposed Fine-YOLO model is evaluated on the EDS dataset, achieving a detection accuracy of 58.3% with only 16.1 M parameters. In addition, an auxiliary validation is performed on the NEU-DET dataset, the detection accuracy reaches 73.1%. Experimental results show that Fine-YOLO is not only suitable for security, but can also be extended to other inspection areas.
RESUMO
Vehicle detection is a research direction in the field of target detection and is widely used in intelligent transportation, automatic driving, urban planning, and other fields. To balance the high-speed advantage of lightweight networks and the high-precision advantage of multiscale networks, a vehicle detection algorithm based on a lightweight backbone network and a multiscale neck network is proposed. The mobile NetV3 lightweight network based on deep separable convolution is used as the backbone network to improve the speed of vehicle detection. The icbam attention mechanism module is used to strengthen the processing of the vehicle feature information detected by the backbone network to enrich the input information of the neck network. The bifpn and icbam attention mechanism modules are integrated into the neck network to improve the detection accuracy of vehicles of different sizes and categories. A vehicle detection experiment on the Ua-Detrac dataset verifies that the proposed algorithm can effectively balance vehicle detection accuracy and speed. The detection accuracy is 71.19%, the number of parameters is 3.8 MB, and the detection speed is 120.02 fps, which meets the actual requirements of the parameter quantity, detection speed, and accuracy of the vehicle detection algorithm embedded in the mobile device.
RESUMO
Convolutional neural networks (CNNs) have made significant progress in the field of facial expression recognition (FER). However, due to challenges such as occlusion, lighting variations, and changes in head pose, facial expression recognition in real-world environments remains highly challenging. At the same time, methods solely based on CNN heavily rely on local spatial features, lack global information, and struggle to balance the relationship between computational complexity and recognition accuracy. Consequently, the CNN-based models still fall short in their ability to address FER adequately. To address these issues, we propose a lightweight facial expression recognition method based on a hybrid vision transformer. This method captures multi-scale facial features through an improved attention module, achieving richer feature integration, enhancing the network's perception of key facial expression regions, and improving feature extraction capabilities. Additionally, to further enhance the model's performance, we have designed the patch dropping (PD) module. This module aims to emulate the attention allocation mechanism of the human visual system for local features, guiding the network to focus on the most discriminative features, reducing the influence of irrelevant features, and intuitively lowering computational costs. Extensive experiments demonstrate that our approach significantly outperforms other methods, achieving an accuracy of 86.51% on RAF-DB and nearly 70% on FER2013, with a model size of only 3.64 MB. These results demonstrate that our method provides a new perspective for the field of facial expression recognition.