Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 7.626
Filter
1.
Radiat Oncol ; 19(1): 106, 2024 Aug 07.
Article in English | MEDLINE | ID: mdl-39113123

ABSTRACT

PURPOSE: Convolutional Neural Networks (CNNs) have emerged as transformative tools in the field of radiation oncology, significantly advancing the precision of contouring practices. However, the adaptability of these algorithms across diverse scanners, institutions, and imaging protocols remains a considerable obstacle. This study aims to investigate the effects of incorporating institution-specific datasets into the training regimen of CNNs to assess their generalization ability in real-world clinical environments. Focusing on a data-centric analysis, the influence of varying multi- and single center training approaches on algorithm performance is conducted. METHODS: nnU-Net is trained using a dataset comprising 161 18F-PSMA-1007 PET images collected from four distinct institutions (Freiburg: n = 96, Munich: n = 19, Cyprus: n = 32, Dresden: n = 14). The dataset is partitioned such that data from each center are systematically excluded from training and used solely for testing to assess the model's generalizability and adaptability to data from unfamiliar sources. Performance is compared through a 5-Fold Cross-Validation, providing a detailed comparison between models trained on datasets from single centers to those trained on aggregated multi-center datasets. Dice Similarity Score, Hausdorff distance and volumetric analysis are used as primary evaluation metrics. RESULTS: The mixed training approach yielded a median DSC of 0.76 (IQR: 0.64-0.84) in a five-fold cross-validation, showing no significant differences (p = 0.18) compared to models trained with data exclusion from each center, which performed with a median DSC of 0.74 (IQR: 0.56-0.86). Significant performance improvements regarding multi-center training were observed for the Dresden cohort (multi-center median DSC 0.71, IQR: 0.58-0.80 vs. single-center 0.68, IQR: 0.50-0.80, p < 0.001) and Cyprus cohort (multi-center 0.74, IQR: 0.62-0.83 vs. single-center 0.72, IQR: 0.54-0.82, p < 0.01). While Munich and Freiburg also showed performance improvements with multi-center training, results showed no statistical significance (Munich: multi-center DSC 0.74, IQR: 0.60-0.80 vs. single-center 0.72, IQR: 0.59-0.82, p > 0.05; Freiburg: multi-center 0.78, IQR: 0.53-0.87 vs. single-center 0.71, IQR: 0.53-0.83, p = 0.23). CONCLUSION: CNNs trained for auto contouring intraprostatic GTV in 18F-PSMA-1007 PET on a diverse dataset from multiple centers mostly generalize well to unseen data from other centers. Training on a multicentric dataset can improve performance compared to training exclusively with a single-center dataset regarding intraprostatic 18F-PSMA-1007 PET GTV segmentation. The segmentation performance of the same CNN can vary depending on the dataset employed for training and testing.


Subject(s)
Neural Networks, Computer , Prostatic Neoplasms , Humans , Male , Prostatic Neoplasms/diagnostic imaging , Prostatic Neoplasms/radiotherapy , Prostatic Neoplasms/pathology , Positron-Emission Tomography/methods , Niacinamide/analogs & derivatives , Oligopeptides , Radiopharmaceuticals , Fluorine Radioisotopes , Image Processing, Computer-Assisted/methods , Datasets as Topic , Algorithms
2.
Nature ; 631(8022): 924-925, 2024 Jul.
Article in English | MEDLINE | ID: mdl-39039191
3.
BMC Pregnancy Childbirth ; 24(1): 460, 2024 Jul 03.
Article in English | MEDLINE | ID: mdl-38961444

ABSTRACT

BACKGROUND AND AIMS: Although minimally invasive hysterectomy offers advantages, abdominal hysterectomy remains the predominant surgical method. Creating a standardized dataset and establishing a hysterectomy registry system present opportunities for early interventions in reducing volume and selecting benign hysterectomy methods. This research aims to develop a dataset for designing benign hysterectomy registration system. METHODS: Between April and September 2020, a qualitative study was carried out to create a data set for enrolling patients who were candidate for hysterectomy. At this stage, the research team conducted an information needs assessment, relevant data element identification, registry software development, and field testing; Subsequently, a web-based application was designed. In June 2023the registry software was evaluated using data extracted from medical records of patients admitted at Al-Zahra Hospital in Tabriz, Iran. RESULTS: During two months, 40 patients with benign hysterectomy were successfully registered. The final dataset for the hysterectomy patient registry comprise 11 main groups, 27 subclasses, and a total of 91 Data elements. Mandatory data and essential reports were defined. Furthermore, a web-based registry system designed and evaluated based on data set and various scenarios. CONCLUSION: Creating a hysterectomy registration system is the initial stride toward identifying and registering hysterectomy candidate patients. this system capture information about the procedure techniques, and associated complications. In Iran, this registry can serve as a valuable resource for assessing the quality of care delivered and the distribution of clinical measures.


Subject(s)
Hospitals, Teaching , Hysterectomy , Registries , Humans , Female , Iran , Hysterectomy/methods , Hysterectomy/statistics & numerical data , Adult , Middle Aged , Referral and Consultation/statistics & numerical data , Qualitative Research , Datasets as Topic
4.
JCO Clin Cancer Inform ; 8: e2300245, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38959448

ABSTRACT

Primer that helps clarify large-scale clinical data sets and participant demographics for oncologists.


Subject(s)
Neoplasms , Oncologists , Humans , Neoplasms/epidemiology , Medical Oncology/methods , Datasets as Topic , Databases, Factual
5.
Nat Neurosci ; 27(7): 1214, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38982200
6.
Cardiovasc Diabetol ; 23(1): 240, 2024 Jul 08.
Article in English | MEDLINE | ID: mdl-38978031

ABSTRACT

BACKGROUND: Metabolism is increasingly recognized as a key regulator of the function and phenotype of the primary cellular constituents of the atherosclerotic vascular wall, including endothelial cells, smooth muscle cells, and inflammatory cells. However, a comprehensive analysis of metabolic changes associated with the transition of plaque from a stable to a hemorrhaged phenotype is lacking. METHODS: In this study, we integrated two large mRNA expression and protein abundance datasets (BIKE, n = 126; MaasHPS, n = 43) from human atherosclerotic carotid artery plaque to reconstruct a genome-scale metabolic network (GEM). Next, the GEM findings were linked to metabolomics data from MaasHPS, providing a comprehensive overview of metabolic changes in human plaque. RESULTS: Our study identified significant changes in lipid, cholesterol, and inositol metabolism, along with altered lysosomal lytic activity and increased inflammatory activity, in unstable plaques with intraplaque hemorrhage (IPH+) compared to non-hemorrhaged (IPH-) plaques. Moreover, topological analysis of this network model revealed that the conversion of glutamine to glutamate and their flux between the cytoplasm and mitochondria were notably compromised in hemorrhaged plaques, with a significant reduction in overall glutamate levels in IPH+ plaques. Additionally, reduced glutamate availability was associated with an increased presence of macrophages and a pro-inflammatory phenotype in IPH+ plaques, suggesting an inflammation-prone microenvironment. CONCLUSIONS: This study is the first to establish a robust and comprehensive GEM for atherosclerotic plaque, providing a valuable resource for understanding plaque metabolism. The utility of this GEM was illustrated by its ability to reliably predict dysregulation in the cholesterol hydroxylation, inositol metabolism, and the glutamine/glutamate pathway in rupture-prone hemorrhaged plaques, a finding that may pave the way to new diagnostic or therapeutic measures.


Subject(s)
Carotid Artery Diseases , Glutamic Acid , Glutamine , Macrophages , Metabolic Networks and Pathways , Phenotype , Plaque, Atherosclerotic , Humans , Glutamine/metabolism , Glutamic Acid/metabolism , Macrophages/metabolism , Macrophages/pathology , Carotid Artery Diseases/metabolism , Carotid Artery Diseases/pathology , Carotid Artery Diseases/genetics , Rupture, Spontaneous , Carotid Arteries/pathology , Carotid Arteries/metabolism , Metabolomics , Databases, Genetic , Inflammation/metabolism , Inflammation/genetics , Inflammation/pathology , Energy Metabolism , Datasets as Topic , Male
7.
Nature ; 632(8023): 55-62, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39085539

ABSTRACT

Advancements in optical coherence control1-5 have unlocked many cutting-edge applications, including long-haul communication, light detection and ranging (LiDAR) and optical coherence tomography6-8. Prevailing wisdom suggests that using more coherent light sources leads to enhanced system performance and device functionalities9-11. Our study introduces a photonic convolutional processing system that takes advantage of partially coherent light to boost computing parallelism without substantially sacrificing accuracy, potentially enabling larger-size photonic tensor cores. The reduction of the degree of coherence optimizes bandwidth use in the photonic convolutional processing system. This breakthrough challenges the traditional belief that coherence is essential or even advantageous in integrated photonic accelerators, thereby enabling the use of light sources with less rigorous feedback control and thermal-management requirements for high-throughput photonic computing. Here we demonstrate such a system in two photonic platforms for computing applications: a photonic tensor core using phase-change-material photonic memories that delivers parallel convolution operations to classify the gaits of ten patients with Parkinson's disease with 92.2% accuracy (92.7% theoretically) and a silicon photonic tensor core with embedded electro-absorption modulators (EAMs) to facilitate 0.108 tera operations per second (TOPS) convolutional processing for classifying the Modified National Institute of Standards and Technology (MNIST) handwritten digits dataset with 92.4% accuracy (95.0% theoretically).


Subject(s)
Neural Networks, Computer , Optics and Photonics , Photons , Tomography, Optical Coherence , Humans , Optics and Photonics/instrumentation , Optics and Photonics/methods , Parkinson Disease/diagnosis , Parkinson Disease/physiopathology , Silicon/chemistry , Tomography, Optical Coherence/instrumentation , Tomography, Optical Coherence/methods , Gait/physiology , Datasets as Topic , Sensitivity and Specificity
8.
Forensic Sci Int ; 361: 112150, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39047517

ABSTRACT

When a disaster occurs, the authority must prioritise two things. First, the search and rescue of lives, and second, the identification and management of deceased individuals. However, with thousands of dead bodies to be individually identified in mass disasters, forensic teams face challenges such as long working hours resulting in a delayed identification process and a public health concern caused by the decomposition of the body. Using dental panoramic imaging, teeth have been used in forensics as a physical marker to estimate the age of an individual. Traditionally, dental age estimation has been performed manually by experts. Although the procedure is fairly simple, the large number of victims and the limited amount of time available to complete the assessment during large-scale disasters make forensic work even more challenging. The emergence of artificial intelligence (AI) in the fields of medicine and dentistry has led to the suggestion of automating the current process as an alternative to the conventional method. This study aims to test the accuracy and performance of the developed deep convolutional neural network system for age estimation in large, out-of-sample Malaysian children dataset using digital dental panoramic imaging. Forensic Dental Estimation Lab (F-DentEst Lab) is a computer application developed to perform the dental age estimation digitally. The introduction of this system is to improve the conventional method of age estimation that significantly increase the efficiency of the age estimation process based on the AI approach. A total number of one-thousand-eight-hundred-and-ninety-two digital dental panoramic images were retrospectively collected to test the F-DentEst Lab. Data training, validation, and testing have been conducted in the early stage of the development of F-DentEst Lab, where the allocation involved 80 % training and the remaining 20 % for testing. The methodology was comprised of four major steps: image preprocessing, which adheres to the inclusion criteria for panoramic dental imaging, segmentation, and classification of mandibular premolars using the Dynamic Programming-Active Contour (DP-AC) method and Deep Convolutional Neural Network (DCNN), respectively, and statistical analysis. The suggested DCNN approach underestimated chronological age with a small ME of 0.03 and 0.05 for females and males, respectively.


Subject(s)
Age Determination by Teeth , Forensic Dentistry , Neural Networks, Computer , Radiography, Panoramic , Humans , Age Determination by Teeth/methods , Malaysia , Forensic Dentistry/methods , Child , Male , Female , Adolescent , Datasets as Topic , Deep Learning , Image Processing, Computer-Assisted
10.
Eur J Radiol ; 177: 111592, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38968751

ABSTRACT

OBJECTIVES: CT pulmonary angiography is the gold standard for diagnosing pulmonary embolism, and DL algorithms are being developed to manage the increase in demand. The nnU-Net is a new auto-adaptive DL framework that minimizes manual tuning, making it easier to develop effective algorithms for medical imaging even without specific expertise. This study assesses the performance of a locally developed nnU-Net algorithm on the RSPECT dataset for PE detection, clot volume measurement, and correlation with right ventricle overload. MATERIALS & METHODS: User input was limited to segmentation using 3DSlicer. We worked with the RSPECT dataset and trained an algorithm from 205 PE and 340 negatives. The test dataset comprised 6573 exams. Performance was tested against PE characteristics, such as central, non-central, and RV overload. Blood clot volume (BCV) was extracted from each exam. We employed ROC curves and logistic regression for statistical validation. RESULTS: Negative studies had a median BCV of 1 µL, which increased to 345 µL in PE-positive cases and 7,378 µL in central PEs. Statistical analysis confirmed a significant BCV correlation with PE presence, central PE, and increased RV/LV ratio (p < 0.0001). The model's AUC for PE detection was 0.865, with an 83 % accuracy at a 55 µL threshold. Central PE detection AUC was 0.937 with 91 % accuracy at 850 µL. The RV overload AUC stood at 0.848 with 79 % accuracy. CONCLUSION: The nnU-Net algorithm demonstrated accurate PE detection, particularly for central PE. BCV is an accurate metric for automated severity stratification and case prioritization. CLINICAL RELEVANCE STATEMENT: The nnU-Net framework can be utilized to create a dependable DL for detecting PE. It offers a user-friendly approach to those lacking expertise in AI and rapidly extracts the Blood Clot Volume, a metric that can evaluate the PE's severity.


Subject(s)
Computed Tomography Angiography , Deep Learning , Pulmonary Embolism , Pulmonary Embolism/diagnostic imaging , Humans , Computed Tomography Angiography/methods , Male , Algorithms , Female , Severity of Illness Index , Middle Aged , Datasets as Topic , Aged
11.
Aust J Prim Health ; 302024 Jul.
Article in English | MEDLINE | ID: mdl-38981000

ABSTRACT

Background Large datasets exist in Australia that make de-identified primary healthcare data extracted from clinical information systems available for research use. This study reviews these datasets for their capacity to provide insight into chronic disease care for Aboriginal and Torres Strait Islander peoples, and the extent to which the principles of Indigenous Data Sovereignty are reflected in data collection and governance arrangements. Methods Datasets were included if they collect primary healthcare clinical information system data, collect data nationally, and capture Aboriginal and Torres Strait Islander peoples. We searched PubMed and the public Internet for data providers meeting the inclusion criteria. We developed a framework to assess data providers across domains, including representativeness, usability, data quality, adherence with Indigenous Data Sovereignty and their capacity to provide insights into chronic disease. Datasets were assessed against the framework based on email interviews and publicly available information. Results We identified seven datasets. Only two datasets reported on chronic disease, collected data nationally and captured a substantial number of Aboriginal and Torres Strait Islander patients. No dataset was identified that captured a significant number of both mainstream general practice clinics and Aboriginal Community Controlled Health Organisations. Conclusions It is critical that more accurate, comprehensive and culturally meaningful Aboriginal and Torres Strait Islander healthcare data are collected. These improvements must be guided by the principles of Indigenous Data Sovereignty and Governance. Validated and appropriate chronic disease indicators for Aboriginal and Torres Strait Islander peoples must be developed, including indicators of social and cultural determinants of health.


Subject(s)
General Practice , Health Services, Indigenous , Humans , Australia , Australian Aboriginal and Torres Strait Islander Peoples , Chronic Disease , Datasets as Topic , General Practice/statistics & numerical data , General Practice/methods , Health Services, Indigenous/statistics & numerical data , Primary Health Care/statistics & numerical data
12.
Nature ; 632(8023): 166-173, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39020176

ABSTRACT

Gene expression in Arabidopsis is regulated by more than 1,900 transcription factors (TFs), which have been identified genome-wide by the presence of well-conserved DNA-binding domains. Activator TFs contain activation domains (ADs) that recruit coactivator complexes; however, for nearly all Arabidopsis TFs, we lack knowledge about the presence, location and transcriptional strength of their ADs1. To address this gap, here we use a yeast library approach to experimentally identify Arabidopsis ADs on a proteome-wide scale, and find that more than half of the Arabidopsis TFs contain an AD. We annotate 1,553 ADs, the vast majority of which are, to our knowledge, previously unknown. Using the dataset generated, we develop a neural network to accurately predict ADs and to identify sequence features that are necessary to recruit coactivator complexes. We uncover six distinct combinations of sequence features that result in activation activity, providing a framework to interrogate the subfunctionalization of ADs. Furthermore, we identify ADs in the ancient AUXIN RESPONSE FACTOR family of TFs, revealing that AD positioning is conserved in distinct clades. Our findings provide a deep resource for understanding transcriptional activation, a framework for examining function in intrinsically disordered regions and a predictive model of ADs.


Subject(s)
Arabidopsis Proteins , Arabidopsis , Gene Expression Regulation, Plant , Protein Domains , Transcription Factors , Transcriptional Activation , Arabidopsis/chemistry , Arabidopsis/genetics , Arabidopsis/metabolism , Arabidopsis Proteins/chemistry , Arabidopsis Proteins/classification , Arabidopsis Proteins/metabolism , Conserved Sequence/genetics , Datasets as Topic , Gene Expression Regulation, Plant/genetics , Indoleacetic Acids/metabolism , Intrinsically Disordered Proteins , Molecular Sequence Annotation , Neural Networks, Computer , Proteome/chemistry , Proteome/metabolism , Transcription Factors/chemistry , Transcription Factors/classification , Transcription Factors/metabolism , Transcriptional Activation/genetics
14.
JACC Cardiovasc Imaging ; 17(8): 865-876, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39001730

ABSTRACT

BACKGROUND: Global longitudinal strain (GLS) is reported to be more reproducible and prognostic than ejection fraction. Automated, transparent methods may increase trust and uptake. OBJECTIVES: The authors developed open machine-learning-based GLS methodology and validate it using multiexpert consensus from the Unity UK Echocardiography AI Collaborative. METHODS: We trained a multi-image neural network (Unity-GLS) to identify annulus, apex, and endocardial curve on 6,819 apical 4-, 2-, and 3-chamber images. The external validation dataset comprised those 3 views from 100 echocardiograms. End-systolic and -diastolic frames were each labelled by 11 experts to form consensus tracings and points. They also ordered the echocardiograms by visual grading of longitudinal function. One expert calculated global strain using 2 proprietary packages. RESULTS: The median GLS, averaged across the 11 individual experts, was -16.1 (IQR: -19.3 to -12.5). Using each case's expert consensus measurement as the reference standard, individual expert measurements had a median absolute error of 2.00 GLS units. In comparison, the errors of the machine methods were: Unity-GLS 1.3, proprietary A 2.5, proprietary B 2.2. The correlations with the expert consensus values were for individual experts 0.85, Unity-GLS 0.91, proprietary A 0.73, proprietary B 0.79. Using the multiexpert visual ranking as the reference, individual expert strain measurements found a median rank correlation of 0.72, Unity-GLS 0.77, proprietary A 0.70, and proprietary B 0.74. CONCLUSIONS: Our open-source approach to calculating GLS agrees with experts' consensus as strongly as the individual expert measurements and proprietary machine solutions. The training data, code, and trained networks are freely available online.


Subject(s)
Consensus , Echocardiography , Image Interpretation, Computer-Assisted , Machine Learning , Neural Networks, Computer , Predictive Value of Tests , Humans , Biomechanical Phenomena , Datasets as Topic , Global Longitudinal Strain , Myocardial Contraction , Observer Variation , Reproducibility of Results , United Kingdom , Ventricular Dysfunction, Left/diagnostic imaging , Ventricular Dysfunction, Left/physiopathology , Ventricular Function, Left
15.
Science ; 385(6706): eadn5529, 2024 Jul 19.
Article in English | MEDLINE | ID: mdl-39024439

ABSTRACT

Meiotic errors of relatively small chromosomes in oocytes result in egg aneuploidies that cause miscarriages and congenital diseases. Unlike somatic cells, which preferentially mis-segregate larger chromosomes, aged oocytes preferentially mis-segregate smaller chromosomes through unclear processes. Here, we provide a comprehensive three-dimensional chromosome identifying-and-tracking dataset throughout meiosis I in live mouse oocytes. This analysis reveals a prometaphase pathway that actively moves smaller chromosomes to the inner region of the metaphase plate. In the inner region, chromosomes are pulled by stronger bipolar microtubule forces, which facilitates premature chromosome separation, a major cause of segregation errors in aged oocytes. This study reveals a spatial pathway that facilitates aneuploidy of small chromosomes preferentially in aged eggs and implicates the role of the M phase in creating a chromosome size-based spatial arrangement.


Subject(s)
Aneuploidy , Chromosome Segregation , Meiosis , Microtubules , Oocytes , Animals , Female , Mice , Chromosomes, Mammalian/genetics , Metaphase , Microtubules/metabolism , Oocytes/cytology , Oocytes/metabolism , Datasets as Topic
16.
Nature ; 632(8023): 122-130, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39020179

ABSTRACT

Genetic variation that influences gene expression and splicing is a key source of phenotypic diversity1-5. Although invaluable, studies investigating these links in humans have been strongly biased towards participants of European ancestries, which constrains generalizability and hinders evolutionary research. Here to address these limitations, we developed MAGE, an open-access RNA sequencing dataset of lymphoblastoid cell lines from 731 individuals from the 1000 Genomes Project6, spread across 5 continental groups and 26 populations. Most variation in gene expression (92%) and splicing (95%) was distributed within versus between populations, which mirrored the variation in DNA sequence. We mapped associations between genetic variants and expression and splicing of nearby genes (cis-expression quantitative trait loci (eQTLs) and cis-splicing QTLs (sQTLs), respectively). We identified more than 15,000 putatively causal eQTLs and more than 16,000 putatively causal sQTLs that are enriched for relevant epigenomic signatures. These include 1,310 eQTLs and 1,657 sQTLs that are largely private to underrepresented populations. Our data further indicate that the magnitude and direction of causal eQTL effects are highly consistent across populations. Moreover, the apparent 'population-specific' effects observed in previous studies were largely driven by low resolution or additional independent eQTLs of the same genes that were not detected. Together, our study expands our understanding of human gene expression diversity and provides an inclusive resource for studying the evolution and function of human genomes.


Subject(s)
Gene Expression Regulation , Genetic Variation , Genome, Human , Internationality , Quantitative Trait Loci , RNA Splicing , Racial Groups , Female , Humans , Male , Artifacts , Bias , Cell Line , Cohort Studies , Datasets as Topic , Epigenomics , Evolution, Molecular , Gene Expression Regulation/genetics , Genetics, Population , Genome, Human/genetics , Lymphocytes/cytology , Lymphocytes/metabolism , Quantitative Trait Loci/genetics , Racial Groups/genetics , RNA Splicing/genetics , Sequence Analysis, RNA
17.
BMC Med Inform Decis Mak ; 24(1): 152, 2024 Jun 04.
Article in English | MEDLINE | ID: mdl-38831432

ABSTRACT

BACKGROUND: Machine learning (ML) has emerged as the predominant computational paradigm for analyzing large-scale datasets across diverse domains. The assessment of dataset quality stands as a pivotal precursor to the successful deployment of ML models. In this study, we introduce DREAMER (Data REAdiness for MachinE learning Research), an algorithmic framework leveraging supervised and unsupervised machine learning techniques to autonomously evaluate the suitability of tabular datasets for ML model development. DREAMER is openly accessible as a tool on GitHub and Docker, facilitating its adoption and further refinement within the research community.. RESULTS: The proposed model in this study was applied to three distinct tabular datasets, resulting in notable enhancements in their quality with respect to readiness for ML tasks, as assessed through established data quality metrics. Our findings demonstrate the efficacy of the framework in substantially augmenting the original dataset quality, achieved through the elimination of extraneous features and rows. This refinement yielded improved accuracy across both supervised and unsupervised learning methodologies. CONCLUSION: Our software presents an automated framework for data readiness, aimed at enhancing the integrity of raw datasets to facilitate robust utilization within ML pipelines. Through our proposed framework, we streamline the original dataset, resulting in enhanced accuracy and efficiency within the associated ML algorithms.


Subject(s)
Machine Learning , Humans , Datasets as Topic , Unsupervised Machine Learning , Algorithms , Supervised Machine Learning , Software
19.
Accid Anal Prev ; 205: 107666, 2024 Sep.
Article in English | MEDLINE | ID: mdl-38901160

ABSTRACT

Only a few researchers have shown how environmental factors and road features relate to Autonomous Vehicle (AV) crash severity levels, and none have focused on the data limitation problems, such as small sample sizes, imbalanced datasets, and high dimensional features. To address these problems, we analyzed an AV crash dataset (2019 to 2021) from the California Department of Motor Vehicles (CA DMV), which included 266 collision reports (51 of those causing injuries). We included external environmental variables by collecting various points of interest (POIs) and roadway features from Open Street Map (OSM) and Data San Francisco (SF). Random Over-Sampling Examples (ROSE) and the Synthetic Minority Over-Sampling Technique (SMOTE) methods were used to balance the dataset and increase the sample size. These two balancing methods were used to expand the dataset and solve the small sample size problem simultaneously. Mutual information, random forest, and XGboost were utilized to address the high dimensional feature and the selection problem caused by including a variety of types of POIs as predictive variables. Because existing studies do not use consistent procedures, we compared the effectiveness of using the feature-selection preprocessing method as the first process to employing the data-balance technique as the first process. Our results showed that AV crash severity levels are related to vehicle manufacturers, vehicle damage level, collision type, vehicle movement, the parties involved in the crash, speed limit, and some types of POIs (areas near transportation, entertainment venues, public places, schools, and medical facilities). Both resampling methods and three data preprocessing methods improved model performance, and the model that used SMOTE and data-balancing first was the best. The results suggest that over-sampling and the feature selection method can improve model prediction performance and define new factors related to AV crash severity levels.


Subject(s)
Accidents, Traffic , Accidents, Traffic/statistics & numerical data , Accidents, Traffic/classification , Humans , Sample Size , California/epidemiology , Automobiles/statistics & numerical data , Datasets as Topic
20.
J Sports Med Phys Fitness ; 64(7): 640-649, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38916087

ABSTRACT

BACKGROUND: The analysis of athletic performance has always aroused great interest from sport scientist. This study utilized machine learning methods to build predictive models using a comprehensive CrossFit (CF) dataset, aiming to reveal valuable insights into the factors influencing performance and emerging trends. METHODS: Random forest (RF) and multiple linear regression (MLR) were employed to predict performance in four key weightlifting exercises within CF: clean and jerk, snatch, back squat, and deadlift. Performance was evaluated using R-squared (R2) values and mean squared error (MSE). Feature importance analysis was conducted using RF, XGBoost, and AdaBoost models. RESULTS: The RF model excelled in deadlift performance prediction (R2=0.80), while the MLR model demonstrated remarkable accuracy in clean and jerk (R2=0.93). Across exercises, clean and jerk consistently emerged as a crucial predictor. The feature importance analysis revealed intricate relationships among exercises, with gender significantly impacting deadlift performance. CONCLUSIONS: This research advances our understanding of performance prediction in CF through machine learning techniques. It provides actionable insights for practitioners, optimize performance, and demonstrates the potential for future advancements in data-driven sports analytics.


Subject(s)
Athletic Performance , Weight Lifting , Humans , Machine Learning , Datasets as Topic , Random Forest , Adult , Data Analysis , Male , Female
SELECTION OF CITATIONS
SEARCH DETAIL