Search | VHL Search Portal

1.

Fusing 2D and 3D molecular graphs as unambiguous molecular descriptors for conformational and chiral stereoisomers.

Du, Wenjie; Yang, Xiaoting; Wu, Di; Ma, FenFen; Zhang, Baicheng; Bao, Chaochao; Huo, Yaoyuan; Jiang, Jun; Chen, Xin; Wang, Yang.

Brief Bioinform ; 24(1)2023 01 19.

Article in English | MEDLINE | ID: mdl-36528804

ABSTRACT

The rapid progress of machine learning (ML) in predicting molecular properties enables high-precision predictions being routinely achieved. However, many ML models, such as conventional molecular graph, cannot differentiate stereoisomers of certain types, particularly conformational and chiral ones that share the same bonding connectivity but differ in spatial arrangement. Here, we designed a hybrid molecular graph network, Chemical Feature Fusion Network (CFFN), to address the issue by integrating planar and stereo information of molecules in an interweaved fashion. The three-dimensional (3D, i.e., stereo) modality guarantees precision and completeness by providing unabridged information, while the two-dimensional (2D, i.e., planar) modality brings in chemical intuitions as prior knowledge for guidance. The zipper-like arrangement of 2D and 3D information processing promotes cooperativity between them, and their synergy is the key to our model's success. Experiments on various molecules or conformational datasets including a special newly created chiral molecule dataset comprised of various configurations and conformations demonstrate the superior performance of CFFN. The advantage of CFFN is even more significant in datasets made of small samples. Ablation experiments confirm that fusing 2D and 3D molecular graphs as unambiguous molecular descriptors can not only effectively distinguish molecules and their conformations, but also achieve more accurate and robust prediction of quantum chemical properties.

Subject(s)

Machine Learning , Stereoisomerism , Molecular Conformation

2.

Transfer Learning on Small Datasets for Improved Fall Detection.

Maray, Nader; Ngu, Anne Hee; Ni, Jianyuan; Debnath, Minakshi; Wang, Lu.

Sensors (Basel) ; 23(3)2023 Jan 18.

Article in English | MEDLINE | ID: mdl-36772148

ABSTRACT

Falls in the elderly are associated with significant morbidity and mortality. While numerous fall detection devices incorporating AI and machine learning algorithms have been developed, no known smartwatch-based system has been used successfully in real-time to detect falls for elderly persons. We have developed and deployed a SmartFall system on a commodity-based smartwatch which has been trialled by nine elderly participants. The system, while being usable and welcomed by the participants in our trials, has two serious limitations. The first limitation is the inability to collect a large amount of personalized data for training. When the fall detection model, which is trained with insufficient data, is used in the real world, it generates a large amount of false positives. The second limitation is the model drift problem. This means an accurate model trained using data collected with a specific device performs sub-par when used in another device. Therefore, building one model for each type of device/watch is not a scalable approach for developing smartwatch-based fall detection system. To tackle those issues, we first collected three datasets including accelerometer data for fall detection problem from different devices: the Microsoft watch (MSBAND), the Huawei watch, and the meta-sensor device. After that, a transfer learning strategy was applied to first explore the use of transfer learning to overcome the small dataset training problem for fall detection. We also demonstrated the use of transfer learning to generalize the model across the heterogeneous devices. Our preliminary experiments demonstrate the effectiveness of transfer learning for improving fall detection, achieving an F1 score higher by over 10% on average, an AUC higher by over 0.15 on average, and a smaller false positive prediction rate than the non-transfer learning approach across various datasets collected using different devices with different hardware specifications.

Subject(s)

Accidental Falls , Machine Learning , Humans , Aged , Accidental Falls/prevention & control , Algorithms

3.

Automatic segmentation of thyroid with the assistance of the devised boundary improvement based on multicomponent small dataset.

Chen, Yifei; Zhang, Xin; Li, Dandan; Park, HyunWook; Li, Xinran; Liu, Peng; Jin, Jing; Shen, Yi.

Appl Intell (Dordr) ; : 1-16, 2023 Mar 15.

Article in English | MEDLINE | ID: mdl-37363389

ABSTRACT

Deep learning has been widely considered in medical image segmentation. However, the difficulty of acquiring medical images and labels can affect the accuracy of the segmentation results for deep learning methods. In this paper, an automatic segmentation method is proposed by devising a multicomponent neighborhood extreme learning machine to improve the boundary attention region of the preliminary segmentation results. The neighborhood features are acquired by training U-Nets with the multicomponent small dataset, which consists of original thyroid ultrasound images, Sobel edge images and superpixel images. Afterward, the neighborhood features are selected by min-redundancy and max-relevance filter in the designed extreme learning machine, and the selected features are used to train the extreme learning machine to obtain supplementary segmentation results. Finally, the accuracy of the segmentation results is improved by adjusting the boundary attention region of the preliminary segmentation results with the supplementary segmentation results. This method combines the advantages of deep learning and traditional machine learning, boosting the accuracy of thyroid segmentation accuracy with a small dataset in a multigroup test.

4.

Prediction reliability of QSAR models: an overview of various validation tools.

De, Priyanka; Kar, Supratik; Ambure, Pravin; Roy, Kunal.

Arch Toxicol ; 96(5): 1279-1295, 2022 05.

Article in English | MEDLINE | ID: mdl-35267067

ABSTRACT

The reliability of any quantitative structure-activity relationship (QSAR) model depends on multiple aspects such as the accuracy of the input dataset, selection of significant descriptors, the appropriate splitting process of the dataset, statistical tools used, and most notably on the measures of validation. Validation, the most crucial step in QSAR model development, confirms the reliability of the developed QSAR models and the acceptability of each step in the model development. The present review deals with various validation tools that involve multiple techniques that improve the model quality and robustness. The double cross-validation tool helps in building improved quality models using different combinations of the same training set in an inner cross-validation loop. This exhaustive method is also integrated for small datasets (< 40 compounds) in another tool, namely the small dataset modeler tool. The main aim of QSAR researchers is to improve prediction quality by lowering the prediction errors for the query compounds. 'Intelligent' selection of multiple models and consensus predictions integrated in the intelligent consensus predictor tool were found to be more externally predictive than individual models. Furthermore, another tool called Prediction Reliability Indicator was explained to understand the quality of predictions for a true external set. This tool uses a composite scoring technique to identify query compounds as 'good' or 'moderate' or 'bad' predictions. We have also discussed a quantitative read-across tool which predicts a chemical response based on the similarity with structural analogues. The discussed tools are freely available from https://dtclab.webs.com/software-tools or http://teqip.jdvu.ac.in/QSAR_Tools/DTCLab/ and https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home (for read-across).

Subject(s)

Quantitative Structure-Activity Relationship , Reproducibility of Results

5.

Scene Classification for Sports Video Summarization Using Transfer Learning.

Rafiq, Muhammad; Rafiq, Ghazala; Agyeman, Rockson; Jin, Seong-Il; Choi, Gyu Sang.

Sensors (Basel) ; 20(6)2020 Mar 18.

Article in English | MEDLINE | ID: mdl-32197502

ABSTRACT

This paper proposes a novel method for sports video scene classification with the particular intention of video summarization. Creating and publishing a shorter version of the video is more interesting than a full version due to instant entertainment. Generating shorter summaries of the videos is a tedious task that requires significant labor hours and unnecessary machine occupation. Due to the growing demand for video summarization in marketing, advertising agencies, awareness videos, documentaries, and other interest groups, researchers are continuously proposing automation frameworks and novel schemes. Since the scene classification is a fundamental component of video summarization and video analysis, the quality of scene classification is particularly important. This article focuses on various practical implementation gaps over the existing techniques and presents a method to achieve high-quality of scene classification. We consider cricket as a case study and classify five scene categories, i.e., batting, bowling, boundary, crowd and close-up. We employ our model using pre-trained AlexNet Convolutional Neural Network (CNN) for scene classification. The proposed method employs new, fully connected layers in an encoder fashion. We employ data augmentation to achieve a high accuracy of 99.26% over a smaller dataset. We conduct a performance comparison against baseline approaches to prove the superiority of the method as well as state-of-the-art models. We evaluate our performance results on cricket videos and compare various deep-learning models, i.e., Inception V3, Visual Geometry Group (VGGNet16, VGGNet19) , Residual Network (ResNet50), and AlexNet. Our experiments demonstrate that our method with AlexNet CNN produces better results than existing proposals.

6.

Baseline Methods for Bayesian Inference in Gumbel Distribution.

Martín, Jacinto; Parra, María Isabel; Pizarro, Mario Martínez; Sanjuán, Eva L.

Entropy (Basel) ; 22(11)2020 Nov 07.

Article in English | MEDLINE | ID: mdl-33287035

ABSTRACT

Usual estimation methods for the parameters of extreme value distributions only employ a small part of the observation values. When block maxima values are considered, many data are discarded, and therefore a lot of information is wasted. We develop a model to seize the whole data available in an extreme value framework. The key is to take advantage of the existing relation between the baseline parameters and the parameters of the block maxima distribution. We propose two methods to perform Bayesian estimation. Baseline distribution method (BDM) consists in computing estimations for the baseline parameters with all the data, and then making a transformation to compute estimations for the block maxima parameters. Improved baseline method (IBDM) is a refinement of the initial idea, with the aim of assigning more importance to the block maxima data than to the baseline values, performed by applying BDM to develop an improved prior distribution. We compare empirically these new methods with the Standard Bayesian analysis with non-informative prior, considering three baseline distributions that lead to a Gumbel extreme distribution, namely Gumbel, Exponential and Normal, by a broad simulation study.

7.

Deep learning based on small sample dataset: prediction of dielectric properties of SrTiO₃-type perovskite with doping modification.

Luo, Quan; Hao, Hua; Liu, Hanxing.

R Soc Open Sci ; 11(5): 231464, 2024 May.

Article in English | MEDLINE | ID: mdl-39076810

ABSTRACT

The perovskite crystal structure represents a semiconductor material poised for widespread application, underpinned by attributes encompassing heightened efficiency, cost-effectiveness and remarkable flexibility. Notably, strontium titanate (SrTiO3)-type perovskite, a prototypical ferroelectric dielectric material, has emerged as a pre-eminent matrix material for enhancing the energy storage capacity of perovskite. Typically, the strategy involves augmenting its dielectric constant through doping to enhance energy storage density. However, SrTiO3 doping data are plagued by significant dispersion, and the small sample size poses a formidable research hurdle, hindering the investigation of dielectric property and energy storage density enhancements. This study endeavours to address this challenge, our foundation lies in the compilation of 200 experimental records related to SrTiO3-type perovskite doping, constituting a small dataset. Subsequently, an interactive framework harnesses deep neural network models and a one-dimensional convolutional neural network model to predict and scrutinize the dataset. Distinctively, the mole percentage of doping elements exclusively serves as input features, yielding significantly enhanced accuracy in dielectric performance prediction. Lastly, rigorous comparisons with traditional machine learning models, specifically gradient boosting regression, validate the superiority and reliability of deep learning models. This research advances a novel, effective methodology and offers a valuable reference for designing and optimizing perovskite energy storage materials.

8.

Usformer: A small network for left atrium segmentation of 3D LGE MRI.

Lin, Hui; López-Tapia, Santiago; Schiffers, Florian; Wu, Yunan; Gunasekaran, Suvai; Hwang, Julia; Bishara, Dima; Kholmovski, Eugene; Elbaz, Mohammed; Passman, Rod S; Kim, Daniel; Katsaggelos, Aggelos K.

Heliyon ; 10(7): e28539, 2024 Apr 15.

Article in English | MEDLINE | ID: mdl-38596055

ABSTRACT

Left atrial (LA) fibrosis plays a vital role as a mediator in the progression of atrial fibrillation. 3D late gadolinium-enhancement (LGE) MRI has been proven effective in identifying LA fibrosis. Image analysis of 3D LA LGE involves manual segmentation of the LA wall, which is both lengthy and challenging. Automated segmentation poses challenges owing to the diverse intensities in data from various vendors, the limited contrast between LA and surrounding tissues, and the intricate anatomical structures of the LA. Current approaches relying on 3D networks are computationally intensive since 3D LGE MRIs and the networks are large. Regarding this issue, most researchers came up with two-stage methods: initially identifying the LA center using a scaled-down version of the MRIs and subsequently cropping the full-resolution MRIs around the LA center for final segmentation. We propose a lightweight transformer-based 3D architecture, Usformer, designed to precisely segment LA volume in a single stage, eliminating error propagation associated with suboptimal two-stage training. The transposed attention facilitates capturing the global context in large 3D volumes without significant computation requirements. Usformer outperforms the state-of-the-art supervised learning methods in terms of accuracy and speed. First, with the smallest Hausdorff Distance (HD) and Average Symmetric Surface Distance (ASSD), it achieved a dice score of 93.1% and 92.0% in the 2018 Atrial Segmentation Challenge and our local institutional dataset, respectively. Second, the number of parameters and computation complexity are largely reduced by 2.8x and 3.8x, respectively. Moreover, Usformer does not require a large dataset. When only 16 labeled MRI scans are used for training, Usformer achieves a 92.1% dice score in the challenge dataset. The proposed Usformer delineates the boundaries of the LA wall relatively accurately, which may assist in the clinical translation of LA LGE for planning catheter ablation of atrial fibrillation.

9.

A Comparative Investigation of Machine Learning Algorithms for Pore-Influenced Fatigue Life Prediction of Additively Manufactured Inconel 718 Based on a Small Dataset.

Hu, Bing-Li; Luo, Yan-Wen; Zhang, Bin; Zhang, Guang-Ping.

Materials (Basel) ; 16(19)2023 Oct 09.

Article in English | MEDLINE | ID: mdl-37834743

ABSTRACT

Fatigue life prediction of Inconel 718 fabricated by laser powder bed fusion was investigated using a miniature specimen tests method and machine learning algorithms. A small dataset-based machine learning framework integrating thirteen kinds of algorithms was constructed to predict the pore-influenced fatigue life. The method of selecting random seeds was employed to evaluate the performance of the algorithms, and then the ranking of various machine learning algorithms for predicting pore-influenced fatigue life on small datasets was obtained by verifying the prediction model twenty or thirty times. The results showed that among the thirteen popular machine learning algorithms investigated, the adaptive boosting algorithm from the boosting category exhibited the best fitting accuracy for fatigue life prediction of the additively manufactured Inconel 718 using the small dataset, followed by the decision tree algorithm in the nonlinear category. The investigation also found that DT, RF, GBDT, and XGBOOST algorithms could effectively predict the fatigue life of the additively manufactured Inconel 718 within the range of 1 × 105 cycles on a small dataset compared to others. These results not only demonstrate the capability of using small dataset-based machine learning techniques to predict fatigue life but also may guide the selection of algorithms that minimize performance evaluation costs when predicting fatigue life.

10.

Swin MAE: Masked autoencoders for small datasets.

Xu, Zi'an; Dai, Yin; Liu, Fayu; Chen, Weibing; Liu, Yue; Shi, Lifu; Liu, Sheng; Zhou, Yuhang.

Comput Biol Med ; 161: 107037, 2023 07.

Article in English | MEDLINE | ID: mdl-37230020

ABSTRACT

The development of deep learning models in medical image analysis is majorly limited by the lack of large-sized and well-annotated datasets. Unsupervised learning does not require labels and is more suitable for solving medical image analysis problems. However, most unsupervised learning methods must be applied to large datasets. To make unsupervised learning applicable to small datasets, we proposed Swin MAE, a masked autoencoder with Swin Transformer as its backbone. Even on a dataset of only a few thousand medical images, Swin MAE can still learn useful semantic features purely from images without using any pre-trained models. It can equal or even slightly outperform the supervised model obtained by Swin Transformer trained on ImageNet in the transfer learning results of downstream tasks. Compared to MAE, Swin MAE brought a performance improvement of twice and five times for downstream tasks on BTCV and our parotid dataset, respectively. The code is publicly available at https://github.com/Zian-Xu/Swin-MAE.

Subject(s)

Parotid Gland , Problem Solving , Semantics

11.

Deep learning strategy for small dataset from atomic force microscopy mechano-imaging on macrophages phenotypes.

Wu, Hao; Zhang, Lei; Zhao, Banglei; Yang, Wenjie; Galluzzi, Massimiliano.

Front Bioeng Biotechnol ; 11: 1259979, 2023.

Article in English | MEDLINE | ID: mdl-37860624

ABSTRACT

The cytoskeleton is involved during movement, shaping, resilience, and functionality in immune system cells. Biomarkers such as elasticity and adhesion can be promising alternatives to detect the status of cells upon phenotype activation in correlation with functionality. For instance, professional immune cells such as macrophages undergo phenotype functional polarization, and their biomechanical behaviors can be used as indicators for early diagnostics. For this purpose, combining the biomechanical sensitivity of atomic force microscopy (AFM) with the automation and performance of a deep neural network (DNN) is a promising strategy to distinguish and classify different activation states. To resolve the issue of small datasets in AFM-typical experiments, nanomechanical maps were divided into pixels with additional localization data. On such an enlarged dataset, a DNN was trained by multimodal fusion, and the prediction was obtained by voting classification. Without using conventional biomarkers, our algorithm demonstrated high performance in predicting the phenotype of macrophages. Moreover, permutation feature importance was employed to interpret the results and unveil the importance of different biophysical properties and, in turn, correlated this with the local density of the cytoskeleton. While our results were demonstrated on the RAW264.7 model cell line, we expect that our methodology could be opportunely customized and applied to distinguish different cell systems and correlate feature importance with biophysical properties to unveil innovative markers for diagnostics.

12.

Accurate segmentation of head and neck radiotherapy CT scans with 3D CNNs: consistency is key.

Henderson, Edward G A; Vasquez Osorio, Eliana M; van Herk, Marcel; Brouwer, Charlotte L; Steenbakkers, Roel J H M; Green, Andrew F.

Phys Med Biol ; 68(8)2023 04 03.

Article in English | MEDLINE | ID: mdl-36893469

ABSTRACT

Objective.Automatic segmentation of organs-at-risk in radiotherapy planning computed tomography (CT) scans using convolutional neural networks (CNNs) is an active research area. Very large datasets are usually required to train such CNN models. In radiotherapy, large, high-quality datasets are scarce and combining data from several sources can reduce the consistency of training segmentations. It is therefore important to understand the impact of training data quality on the performance of auto-segmentation models for radiotherapy.Approach.In this study, we took an existing 3D CNN architecture for head and neck CT auto-segmentation and compare the performance of models trained with a small, well-curated dataset (n= 34) and then a far larger dataset (n= 185) containing less consistent training segmentations. We performed 5-fold cross-validations in each dataset and tested segmentation performance using the 95th percentile Hausdorff distance and mean distance-to-agreement metrics. Finally, we validated the generalisability of our models with an external cohort of patient data (n= 12) with five expert annotators.Main results.The models trained with a large dataset were greatly outperformed by models (of identical architecture) trained with a smaller, but higher consistency set of training samples. Our models trained with a small dataset produce segmentations of similar accuracy as expert human observers and generalised well to new data, performing within inter-observer variation.Significance.We empirically demonstrate the importance of highly consistent training samples when training a 3D auto-segmentation model for use in radiotherapy. Crucially, it is the consistency of the training segmentations which had a greater impact on model performance rather than the size of the dataset used.

Subject(s)

Head , Image Processing, Computer-Assisted , Humans , Image Processing, Computer-Assisted/methods , Neck , Neural Networks, Computer , Tomography, X-Ray Computed

13.

Nano-read-across predictions of toxicity of metal oxide engineered nanoparticles (MeOx ENPS) used in nanopesticides to BEAS-2B and RAW 264.7 cells.

Roy, Joyita; Roy, Kunal.

Nanotoxicology ; 16(5): 629-644, 2022 06.

Article in English | MEDLINE | ID: mdl-36260491

ABSTRACT

The demand for nutrients and new technologies has increased with population growth. The agro-technological revolution with metal oxide engineered nanoparticles (MeOx ENPs) has the potential to reform the resilient agricultural system while maintaining the security of food. When utilized extensively, MeOx ENPs may have unintended toxicological effects on both target and non-targeted species. Since limited information about nanopesticides' pernicious effects is available, in silico modeling can be done to explore these issues. Hence, in the present work, we have applied computational modeling to explore the influence of metal oxide nanoparticles on the toxicity of bronchial epithelial (BEAS-2B) and murine myeloid (RAW 264.7) cells to bridge the data gap relating to the toxicity of MeOx NPs. Initially, partial least squares (PLS) regression models were developed applying the Small Dataset Modeler software (http://teqip.jdvu.ac.in/QSAR_Tools/DTCLab/) using four datasets having effective concentration (EC50%) as the endpoints and employing only periodic table descriptors. To further explore the predictions, we applied a read-across approach using the descriptors selected in the QSAR models. Also, the inter-endpoint cytotoxicity relationship modeling (quantitative toxicity-toxicity relationship or QTTR) was conducted. It was found that the result obtained by nano-read-across provided a similar level of accuracy as provided by QSAR. The information derived from the PLS models of both the cell lines suggested that metal cation formation, and bond-forming capacity influence the toxicity whereas the presence of metal has an influential impact on the ecotoxicological effects. Thus, it is feasible to design safe nanopesticides that could be more effective than conventional analogs.

Subject(s)

Metal Nanoparticles , Oxides , Mice , Animals , RAW 264.7 Cells , Oxides/toxicity , Metal Nanoparticles/toxicity , Metals , Ecotoxicology , Quantitative Structure-Activity Relationship

14.

Modeling and mechanistic understanding of cytotoxicity of metal oxide nanoparticles (MeOxNPs) to Escherichia coli: categorization and data gap filling for untested metal oxides.

Roy, Joyita; Roy, Kunal.

Nanotoxicology ; 16(2): 152-164, 2022 03.

Article in English | MEDLINE | ID: mdl-35166631

ABSTRACT

Metal oxide nanoparticles (MeOxNPs) production is expected to increase every year exponentially, and their potential to cause adverse effect to the environment and human health will also expand rapidly. Hence, risk assessment of nanoparticles (NPs) is necessary to design ecosafe products. However, experimental ecotoxicological assessments are time-consuming requiring a lot of resources. Therefore, researchers rely on alternative in silico approaches to predict the behavior of NPs in the biological system. Quantitative structure - toxicity relationship (QSTR) has been adopted as a potential method to predict the cytotoxicity of untested NPs. Hence, in the present study, multiple linear regression (MLR) models were developed using 17 MeOxNPs on Escherichia coli (E. coli) bacteria cells under both light and dark conditions. The models were developed applying Small Dataset Modeler software, version 1.0.0 (http://teqip.jdvu.ac.in/QSAR_Tools/DTCLab/) which generates models with a limited number of data points. Periodic table-based descriptors (both 1st and 2nd generation) were used for the modeling purpose. Two statistically significant MLR models based on photo-induced toxicity (Q(LOO)2= 0.612,R2 = 0.726) and dark-based toxicity (Q(LOO)2= 0.627,R2 = 0.770) were developed. From the developed models, we interpreted that increase in valency and oxidation state of the metal will decrease the cytotoxicity whereas the atomic radius of the metal and electronegativity of MeOxNPs influence the toxicity toward E. coli cells. The MLR models were validated using different internal validation metrics. Additionally, we have collected 42 MeOxNPs as an external set to observe the predictive power of the two developed MLR models and categorize them into toxic and non-toxic classes. The chemical features selected in the developed models are important for understanding the mechanisms of nanotoxicity. Thus, the developed models can be a scientific basis for designing safer NPs.

Subject(s)

Metal Nanoparticles , Oxides , Escherichia coli , Humans , Metal Nanoparticles/chemistry , Metal Nanoparticles/toxicity , Metals , Oxides/chemistry , Oxides/toxicity , Quantitative Structure-Activity Relationship

15.

QSAR model to predict K_p,uu,brain with a small dataset, incorporating predicted values of related parameter.

Umemori, Y; Handa, K; Sakamoto, S; Kageyama, M; Iijima, T.

SAR QSAR Environ Res ; 33(11): 885-897, 2022 Nov.

Article in English | MEDLINE | ID: mdl-36420623

ABSTRACT

The unbound brain-to-plasma concentration ratio (Kp,uu,brain) is a parameter that indicates the extent of central nervous system penetration. Pharmaceutical companies build prediction models because many experiments are required to obtain Kp,uu,brain. However, the lack of data hinders the design of an accurate prediction model. To construct a quantitative structure-activity relationship (QSAR) model with a small dataset of Kp,uu,brain, we investigated whether the prediction accuracy could be improved by incorporating software-predicted brain penetration-related parameters (BPrPs) as explanatory variables for pharmacokinetic parameter prediction. We collected 88 compounds with experimental Kp,uu,brain from various official publications. Random forest was used as the machine learning model. First, we developed prediction models using only structural descriptors. Second, we verified the predictive accuracy of each model with the predicted values of BPrPs incorporated in various combinations. Third, the Kp,uu,brain of the in-house compounds was predicted and compared with the experimental values. The prediction accuracy was improved using five-fold cross-validation (RMSE = 0.455, r2 = 0.726) by incorporating BPrPs. Additionally, this model was verified using an external in-house dataset. The result suggested that using BPrPs as explanatory variables improve the prediction accuracy of the Kp,uu,brain QSAR model when the available number of datasets is small.

Subject(s)

Brain , Quantitative Structure-Activity Relationship , Machine Learning , Software

16.

Detecting Proximal Caries on Periapical Radiographs Using Convolutional Neural Networks with Different Training Strategies on Small Datasets.

Lin, Xiujiao; Hong, Dengwei; Zhang, Dong; Huang, Mingyi; Yu, Hao.

Diagnostics (Basel) ; 12(5)2022 Apr 21.

Article in English | MEDLINE | ID: mdl-35626203

ABSTRACT

The present study aimed to evaluate the performance of convolutional neural networks (CNNs) that were trained with small datasets using different strategies in the detection of proximal caries at different levels of severity on periapical radiographs. Small datasets containing 800 periapical radiographs were randomly categorized into a training and validation dataset (n = 600) and a test dataset (n = 200). A pretrained Cifar-10Net CNN was used in the present study. Different training strategies were used to train the CNN model independently; these strategies were defined as image recognition (IR), edge extraction (EE), and image segmentation (IS). Different metrics, such as sensitivity and area under the receiver operating characteristic curve (AUC), for the trained CNN and human observers were analysed to evaluate the performance in detecting proximal caries. IR, EE, and IS recognition modes and human eyes achieved AUCs of 0.805, 0.860, 0.549, and 0.767, respectively, with the EE recognition mode having the highest values (p all < 0.05). The EE recognition mode was significantly more sensitive in detecting both enamel and dentin caries than human eyes (p all < 0.05). The CNN trained with the EE strategy, the best performer in the present study, showed potential utility in detecting proximal caries on periapical radiographs when using small datasets.

17.

Neural-Network-Based Approaches for Optimization of Machining Parameters Using Small Dataset.

Kosarac, Aleksandar; Mladjenovic, Cvijetin; Zeljkovic, Milan; Tabakovic, Slobodan; Knezev, Milos.

Materials (Basel) ; 15(3)2022 Jan 18.

Article in English | MEDLINE | ID: mdl-35160646

ABSTRACT

Surface quality is one of the most important indicators of the quality of machined parts. The analytical method of defining the arithmetic mean roughness is not applied in practice due to its complexity and empirical models are applied only for certain values of machining parameters. This paper presents the design and development of artificial neural networks (ANNs) for the prediction of the arithmetic mean roughness, which is one of the most common surface roughness parameters. The dataset used for ANN development were obtained experimentally by machining AA7075 aluminum alloy under various machining conditions. With four factors, each having three levels, the full factorial design considers a total of 81 experiments that have to be carried out. Using input factor-level settings and adopting the Taguchi method, the experiments were reduced from 81 runs to 27 runs through an orthogonal design. In this study we aimed to check how reliable the results of artificial neural networks were when obtained based on a small input-output dataset, as in the case of applying the Taguchi methodology of planning a four-factor and three-level experiment, in which 27 trials were conducted. Furthermore, this paper considers the optimization of machining parameters for minimizing surface roughness in machining AA7075 aluminum alloy. The results show that ANNs can be successfully trained with small data and used to predict the arithmetic mean roughness. The best results were achieved by backpropagation multilayer feedforward neural networks using the BR algorithm for training.

18.

Effect of dataset size, image quality, and image type on deep learning-based automatic prostate segmentation in 3D ultrasound.

Orlando, Nathan; Gyacskov, Igor; Gillies, Derek J; Guo, Fumin; Romagnoli, Cesare; D'Souza, David; Cool, Derek W; Hoover, Douglas A; Fenster, Aaron.

Phys Med Biol ; 67(7)2022 03 29.

Article in English | MEDLINE | ID: mdl-35240585

ABSTRACT

Three-dimensional (3D) transrectal ultrasound (TRUS) is utilized in prostate cancer diagnosis and treatment, necessitating time-consuming manual prostate segmentation. We have previously developed an automatic 3D prostate segmentation algorithm involving deep learning prediction on radially sampled 2D images followed by 3D reconstruction, trained on a large, clinically diverse dataset with variable image quality. As large clinical datasets are rare, widespread adoption of automatic segmentation could be facilitated with efficient 2D-based approaches and the development of an image quality grading method. The complete training dataset of 6761 2D images, resliced from 206 3D TRUS volumes acquired using end-fire and side-fire acquisition methods, was split to train two separate networks using either end-fire or side-fire images. Split datasets were reduced to 1000, 500, 250, and 100 2D images. For deep learning prediction, modified U-Net and U-Net++ architectures were implemented and compared using an unseen test dataset of 40 3D TRUS volumes. A 3D TRUS image quality grading scale with three factors (acquisition quality, artifact severity, and boundary visibility) was developed to assess the impact on segmentation performance. For the complete training dataset, U-Net and U-Net++ networks demonstrated equivalent performance, but when trained using split end-fire/side-fire datasets, U-Net++ significantly outperformed the U-Net. Compared to the complete training datasets, U-Net++ trained using reduced-size end-fire and side-fire datasets demonstrated equivalent performance down to 500 training images. For this dataset, image quality had no impact on segmentation performance for end-fire images but did have a significant effect for side-fire images, with boundary visibility having the largest impact. Our algorithm provided fast (<1.5 s) and accurate 3D segmentations across clinically diverse images, demonstrating generalizability and efficiency when employed on smaller datasets, supporting the potential for widespread use, even when data is scarce. The development of an image quality grading scale provides a quantitative tool for assessing segmentation performance.

Subject(s)

Deep Learning , Prostatic Neoplasms , Humans , Male , Pelvis , Prostate/diagnostic imaging , Prostatic Neoplasms/diagnostic imaging , Ultrasonography

19.

Optimized XGBoost Model with Small Dataset for Predicting Relative Density of Ti-6Al-4V Parts Manufactured by Selective Laser Melting.

Zou, Miao; Jiang, Wu-Gui; Qin, Qing-Hua; Liu, Yu-Cheng; Li, Mao-Lin.

Materials (Basel) ; 15(15)2022 Aug 01.

Article in English | MEDLINE | ID: mdl-35955237

ABSTRACT

Determining the quality of Ti-6Al-4V parts fabricated by selective laser melting (SLM) remains a challenge due to the high cost of SLM and the need for expertise in processes and materials. In order to understand the correspondence of the relative density of SLMed Ti-6Al-4V parts with process parameters, an optimized extreme gradient boosting (XGBoost) decision tree model was developed in the present paper using hyperparameter optimization with the GridsearchCV method. In particular, the effect of the size of the dataset for model training and testing on model prediction accuracy was examined. The results show that with the reduction in dataset size, the prediction accuracy of the proposed model decreases, but the overall accuracy can be maintained within a relatively high accuracy range, showing good agreement with the experimental results. Based on a small dataset, the prediction accuracy of the optimized XGBoost model was also compared with that of artificial neural network (ANN) and support vector regression (SVR) models, and it was found that the optimized XGBoost model has better evaluation indicators such as mean absolute error, root mean square error, and the coefficient of determination. In addition, the optimized XGBoost model can be easily extended to the prediction of mechanical properties of more metal materials manufactured by SLM processes.

20.

A Comparative Study of Deep Learning Classification Methods on a Small Environmental Microorganism Image Dataset (EMDS-6): From Convolutional Neural Networks to Visual Transformers.

Zhao, Peng; Li, Chen; Rahaman, Md Mamunur; Xu, Hao; Yang, Hechen; Sun, Hongzan; Jiang, Tao; Grzegorzek, Marcin.

Front Microbiol ; 13: 792166, 2022.

Article in English | MEDLINE | ID: mdl-35308350

ABSTRACT

In recent years, deep learning has made brilliant achievements in Environmental Microorganism (EM) image classification. However, image classification of small EM datasets has still not obtained good research results. Therefore, researchers need to spend a lot of time searching for models with good classification performance and suitable for the current equipment working environment. To provide reliable references for researchers, we conduct a series of comparison experiments on 21 deep learning models. The experiment includes direct classification, imbalanced training, and hyper-parameters tuning experiments. During the experiments, we find complementarities among the 21 models, which is the basis for feature fusion related experiments. We also find that the data augmentation method of geometric deformation is difficult to improve the performance of VTs (ViT, DeiT, BotNet, and T2T-ViT) series models. In terms of model performance, Xception has the best classification performance, the vision transformer (ViT) model consumes the least time for training, and the ShuffleNet-V2 model has the least number of parameters.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL