ABSTRACT
Hi-C and 3C-seq are powerful tools to study the 3D genomes of bacteria and archaea, whose small cell sizes and growth conditions are often intractable to detailed microscopic analysis. However, the circularity of prokaryotic genomes requires a number of tricks for Hi-C/3C-seq data analysis. Here, I provide a practical guide to use the HiC-Pro pipeline for Hi-C/3C-seq data obtained from prokaryotes.
Subject(s)
Genome, Bacterial , Software , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Prokaryotic Cells/metabolism , Genome, Archaeal , Archaea/genetics , Bacteria/genetics , Computational Biology/methods , Data AnalysisABSTRACT
This research explores the application of quadratic polynomials in Python for advanced data analysis. The study demonstrates how quadratic models can effectively capture nonlinear relationships in complex datasets by leveraging Python libraries such as NumPy, Matplotlib, scikit-learn, and Pandas. The methodology involves fitting quadratic polynomials to the data using least-squares regression and evaluating the model fit using the coefficient of determination (R-squared). The results highlight the strong performance of the quadratic polynomial fit, as evidenced by high R-squared values, indicating the model's ability to explain a substantial proportion of the data variability. Comparisons with linear and cubic models further underscore the quadratic model's balance between simplicity and precision for many practical applications. The study also acknowledges the limitations of quadratic polynomials and proposes future research directions to enhance their accuracy and efficiency for diverse data analysis tasks. This research bridges the gap between theoretical concepts and practical implementation, providing an accessible Python-based tool for leveraging quadratic polynomials in data analysis.
This study examines how quadratic polynomials, which are mathematical equations used to model and understand patterns in data, can be effectively applied using Python, a versatile programming language with libraries suited for mathematical and visual analysis. Researchers have focused on the adaptability of these polynomials in various fields, from software analytics to materials science, in order to provide practical Python code examples. They also discussed the predictive accuracy of the method, confirmed through a statistical measure called R-squared, and acknowledged the need for future research to integrate more complex models for richer data interpretation.
Subject(s)
Data Analysis , Algorithms , Software , Least-Squares Analysis , Models, StatisticalABSTRACT
Mass cytometry is a cutting-edge high-dimensional technology for profiling marker expression at the single-cell level, advancing clinical research in immune monitoring. Nevertheless, the vast data generated by cytometry by time-of-flight (CyTOF) poses a significant analytical challenge. To address this, we describe ImmCellTyper (https://github.com/JingAnyaSun/ImmCellTyper), a novel toolkit for CyTOF data analysis. This framework incorporates BinaryClust, an in-house developed semi-supervised clustering tool that automatically identifies main cell types. BinaryClust outperforms existing clustering tools in accuracy and speed, as shown in benchmarks with two datasets of approximately 4 million cells, matching the precision of manual gating by human experts. Furthermore, ImmCellTyper offers various visualisation and analytical tools, spanning from quality control to differential analysis, tailored to users' specific needs for a comprehensive CyTOF data analysis solution. The workflow includes five key steps: (1) batch effect evaluation and correction, (2) data quality control and pre-processing, (3) main cell lineage characterisation and quantification, (4) in-depth investigation of specific cell types; and (5) differential analysis of cell abundance and functional marker expression across study groups. Overall, ImmCellTyper combines expert biological knowledge in a semi-supervised approach to accurately deconvolute well-defined main cell lineages, while maintaining the potential of unsupervised methods to discover novel cell subsets, thus facilitating high-dimensional immune profiling.
Subject(s)
Data Analysis , Flow Cytometry , Single-Cell Analysis , Humans , Flow Cytometry/methods , Single-Cell Analysis/methods , Software , Cluster AnalysisABSTRACT
The analysis of eye movements has proven valuable for understanding brain function and the neuropathology of various disorders. This research aims to utilize eye movement data analysis as a screening tool for differentiation between eight different groups of pathologies, including scholar, neurologic, and postural disorders. Leveraging a dataset from 20 clinical centers, all employing AIDEAL and REMOBI eye movement technologies this study extends prior research by considering a multi-annotation setting, incorporating information from recordings from saccade and vergence eye movement tests, and using contextual information (e.g. target signals and latency of the eye movement relative to the target and confidence level of the quality of eye movement recording) to improve accuracy while reducing noise interference. Additionally, we introduce a novel hybrid architecture that combines the weight-sharing feature of convolution layers with the long-range capabilities of the transformer architecture to improve model efficiency and reduce the computation cost by a factor of 3.36, while still being competitive in terms of macro F1 score. Evaluated on two diverse datasets, our method demonstrates promising results, the most powerful discrimination being Attention & Neurologic; with a macro F1 score of up to 78.8%; disorder. The results indicate the effectiveness of our approach in classifying eye movement data from different pathologies and different clinical centers accurately, thus enabling the creation of an assistant tool in the future.
Subject(s)
Eye Movements , Humans , Eye Movements/physiology , Saccades/physiology , Data Analysis , Nervous System Diseases/diagnosis , MaleABSTRACT
Background: Air pollution is one of the biggest problems in societies today. The intensity of indoor and outdoor air pollutants and the urbanization rate can cause or trigger many different diseases, especially lung cancer. In this context, this study's aim is to reveal the effects of the indoor and outdoor air pollutants, and urbanization rate on the lung cancer cases. Methods: Panel data analysis method is applied in this study. The research includes the period between 1990 and 2019 as a time series and the data type of the variables is annual. The dependent variable in the research model is lung cancer cases per 100,000 people. The independent variables are the level of outdoor air pollution, air pollution level indoor environment and urbanization rate of countries. Results: In the modeling developed for the developed country group, it is seen that the variable with the highest level of effect on lung cancer is the outdoor air pollution level. Conclusions: In parallel with the development of countries, it has been determined that the increase in industrial production wastes, in other words, worsening the air quality, may potentially cause an increase in lung cancer cases. Indoor air quality is also essential for human health; negative changes in this variable may negatively impact individuals' health, especially lung cancer.
Subject(s)
Air Pollution , Lung Neoplasms , Humans , Lung Neoplasms/etiology , Lung Neoplasms/epidemiology , Air Pollution/adverse effects , Air Pollution/analysis , Developed Countries/statistics & numerical data , Air Pollutants/analysis , Air Pollutants/adverse effects , Air Pollution, Indoor/adverse effects , Air Pollution, Indoor/analysis , Data Analysis , Urbanization , Income/statistics & numerical data , Environmental Exposure/adverse effects , Environmental Exposure/statistics & numerical dataABSTRACT
Background: Hand, foot, and mouth disease (HFMD) is a notable infectious disease predominantly affecting infants and children worldwide. Previous studies on HFMD have primarily focused on natural patterns, such as seasonality, but research on the influence of important social time points is lacking. Several studies have indicated correlations between birthdays and certain disease outcomes. Objective: This study aimed to explore the association between birthdays and HFMD. Methods: Surveillance data on HFMD from 2008 to 2022 in Yunnan Province, China, were collected. We defined the period from 6 days before the birthday to the exact birthday as the "birthday week." The effect of the birthday week was measured by the proportion of cases occurring during this period, termed the "birthday week proportion." We conducted subgroup analyses to present the birthday week proportions across sexes, age groups, months of birth, and reporting years. Additionally, we used a modified Poisson regression model to identify conditional subgroups more likely to contract HFMD during the birthday week. Results: Among the 973,410 cases in total, 116,976 (12.02%) occurred during the birthday week, which is 6.27 times the average weekly proportion (7/365, 1.92%). While the birthday week proportions were similar between male and female individuals (68,849/564,725, 12.19% vs 48,127/408,685, 11.78%; χ21=153.25, P<.001), significant differences were observed among different age groups (χ23=47,145, P<.001) and months of birth (χ211=16,942, P<.001). Compared to other age groups, infants aged 0-1 year had the highest birthday week proportion (30,539/90,709, 33.67%), which is 17.57 times the average weekly proportion. Compared to other months, patients born from April to July and from October to December, the peak months of the HFMD epidemic, had higher birthday week proportions. Additionally, a decreasing trend in birthday week proportions from 2008 to 2022 was observed, dropping from 33.74% (3914/11,600) to 2.77% (2254/81,372; Cochran-Armitage trend test: Z=-102.53, P<.001). The results of the modified Poisson regression model further supported the subgroup analyses findings. Compared with children aged >7 years, infants aged 0-1 year were more likely to contract HFMD during the birthday week (relative risk 1.182, 95% CI 1.177-1.185; P<.001). Those born during peak epidemic months exhibited a higher propensity for contracting HFMD during their birthday week. Compared with January, the highest relative risk was observed in May (1.087, 95% CI 1.084-1.090; P<.001). Conclusions: This study identified a novel "birthday week effect" of HFMD, particularly notable for infants approaching their first birthday and those born during peak epidemic months. Improvements in surveillance quality may explain the declining trend of the birthday week effect over the years. Higher exposure risk during the birthday period and potential biological mechanisms might also account for this phenomenon. Raising public awareness of the heightened risk during the birthday week could benefit HFMD prevention and control.
Subject(s)
Hand, Foot and Mouth Disease , Hand, Foot and Mouth Disease/epidemiology , China/epidemiology , Humans , Female , Male , Infant , Child, Preschool , Child , Adolescent , Infant, Newborn , Anniversaries and Special Events , Data AnalysisABSTRACT
Computerized adaptive testing (CAT) has become a widely adopted test design for high-stakes licensing and certification exams, particularly in the health professions in the United States, due to its ability to tailor test difficulty in real time, reducing testing time while providing precise ability estimates. A key component of CAT is item response theory (IRT), which facilitates the dynamic selection of items based on examinees' ability levels during a test. Accurate estimation of item and ability parameters is essential for successful CAT implementation, necessitating convenient and reliable software to ensure precise parameter estimation. This paper introduces the irtQ R package, which simplifies IRT-based analysis and item calibration under unidimensional IRT models. While it does not directly simulate CAT, it provides essential tools to support CAT development, including parameter estimation using marginal maximum likelihood estimation via the expectation-maximization algorithm, pretest item calibration through fixed item parameter calibration and fixed ability parameter calibration methods, and examinee ability estimation. The package also enables users to compute item and test characteristic curves and information functions necessary for evaluating the psychometric properties of a test. This paper illustrates the key features of the irtQ package through examples using simulated datasets, demonstrating its utility in IRT applications such as test data analysis and ability scoring. By providing a user-friendly environment for IRT analysis, irtQ significantly enhances the capacity for efficient adaptive testing research and operations. Finally, the paper highlights additional core functionalities of irtQ, emphasizing its broader applicability to the development and operation of IRT-based assessments.
Subject(s)
Educational Measurement , Psychometrics , Software , Humans , Educational Measurement/methods , Educational Measurement/standards , Calibration , Algorithms , United States , Data Analysis , Health Occupations/educationABSTRACT
The epidemiology of idiopathic inflammatory myopathies (IIMs) varies by country. Investigating the epidemiological profile among Thai IIMs could help to inform public health policy, potentially leading to cost-reducing strategies. We aimed to assess the prevalence and incidence of IIM in the Thai population between 2017 and 2020. A descriptive epidemiological study was conducted on patients 18 or older, using data from the Information and Communication Technology Center, Ministry of Public Health, with a primary diagnosis of dermatopolymyositis, as indicated by the ICD-10 codes M33. The prevalence and incidence of IIMs were analyzed with their 95% confidence intervals (CIs) and then categorized by sex and region. In 2017, the IIM cases numbered 9,074 among 65,204,797 Thais, resulting in a prevalence of 13.9 per 100,000 population (95% CI 13.6-14.2). IIMs were slightly more prevalent among women than men (16.8 vs 10.9 per 100,000). Between 2018 and 2020, the incidence of IIMs slightly declined from 5.09 (95% CI 4.92-5.27) in 2017 and 4.92 (95% CI 4.76-5.10) in 2019 to 4.43 (95% CI 4.27-4.60) per 100,000 person-years in 2020. The peak age group was 50-69 years. Between 2018 and 2020, the majority of cases occurred in southern Thailand, with incidence rates of 7.60, 8.34, and 8.74 per 100,000 person-years. IIMs are uncommon among Thais, with a peak incidence in individuals between 60 and 69, especially in southern Thailand. The incidence of IIMs decreased between 2019 and 2020, most likely due to the COVID-19 pandemic, which reduced reports and investigations.
Subject(s)
Myositis , Humans , Thailand/epidemiology , Male , Female , Incidence , Middle Aged , Prevalence , Adult , Aged , Myositis/epidemiology , Young Adult , Public Health , Adolescent , COVID-19/epidemiology , Aged, 80 and over , Data AnalysisABSTRACT
BACKGROUND: Atrial fibrillation (AF) is the most common type of arrhythmia. Heart rate variability (HRV) may be associated with AF risk. The aim of this study was to test HRV indices and arrhythmias as predictors of paroxysmal AF based on 24-hour dynamic electrocardiogram recordings of patients. METHODS: A total of 199 patients with paroxysmal AF (AF group) and 204 elderly volunteers over 60 years old (Control group) who underwent a 24-hour dynamic electrocardiogram from August 2022 to March 2023 were included. Time-domain indices, frequency-domain indices, and arrhythmia data of the two groups were classified and measured. Binary logistic regression analysis was performed on variables with significant differences to identify independent risk factors. A nomogram prediction model was established, and the sum of individual scores of each variable was calculated. RESULTS: Gender, age, body mass index and low-density lipoprotein (LDL) did not differ significantly between AF and Control groups (p > 0.05), whereas significant group differences were found for smoking, hypertension, diabetes, and high-density lipoprotein (HDL) (p < 0.05). The standard deviation of all normal to normal (NN) R-R intervals (SDNN), standard deviation of 5-minute average NN intervals (SDANN), root mean square of successive NN interval differences (rMSSD), 50 ms from the preceding interval (pNN50), low-frequency/high-frequency (LF/HF), LF, premature atrial contractions (PACs), atrial tachycardia (AT), T-wave index, and ST-segment index differed significantly between the two groups. Logistic regression analysis identified rMSSD, PACs, and AT as independent predictors of AF. For each unit increase in rMSSD and PACs, the odds of developing AF increased by 1.0357 and 1.0005 times, respectively. For each unit increase in AT, the odds of developing AF decreased by 0.9976 times. The total score of the nomogram prediction model ranged from 0 to 110. CONCLUSION: The autonomic nervous system (ANS) plays a pivotal role in the occurrence and development of AF. The individualized nomogram prediction model of AF occurrence contributes to the early identification of high-risk patients with AF.
Subject(s)
Atrial Fibrillation , Heart Rate , Humans , Atrial Fibrillation/physiopathology , Atrial Fibrillation/diagnosis , Atrial Fibrillation/epidemiology , Heart Rate/physiology , Male , Female , Middle Aged , Aged , Risk Factors , Electrocardiography/methods , Nomograms , Electrocardiography, Ambulatory/methods , Data Analysis , Arrhythmias, Cardiac/physiopathology , Arrhythmias, Cardiac/diagnosis , Arrhythmias, Cardiac/epidemiology , Arrhythmias, Cardiac/etiologyABSTRACT
BACKGROUND: Parental domestic violence and abuse (DVA), mental ill-health (MH), and substance misuse (SU) can have a negative impact on both parents and children. However, it remains unclear if and how parental DVA, MH, and SU cluster and the impacts this clustering might have. We examined how parental DVA, MH, and SU cluster during early childhood, the demographic/contextual profiles of these clusters, and how these clusters relate to child MH trajectories. METHODS: We examined data from 15,377 families in the UK Millennium Cohort Study. We used: (1) latent class analysis to create groups differentially exposed to parental DVA, MH, and SU at age three; (2) latent growth curve modelling to create latent trajectories of child MH from ages 3-17; and (3) a case-weight approach to relate latent classes to child MH trajectories. RESULTS: We identified three latent classes: high-frequency alcohol use (11.9%), elevated adversity (3.5%), and low-level adversity (84.6%). Children in the elevated adversity class had higher probabilities of being from low-socioeconomic backgrounds and having White, younger parents. Children exposed to elevated adversity displayed worse MH at age three (intercept = 2.274; p < 0.001) compared the low-level adversity (intercept = 2.228; p < 0.001) and high-frequency alcohol use class (intercept = 2.068; p < 0.001). However, latent growth factors (linear and quadratic terms) of child MH did not differ by latent class. CONCLUSIONS: Parental DVA, MH, and SU cluster during early childhood and this has a negative impact on children's MH at age three, leading to similar levels of poor MH across time. Intervening early to prevent the initial deterioration, using a syndemic-approach is essential.
Subject(s)
Domestic Violence , Mental Disorders , Substance-Related Disorders , Humans , United Kingdom/epidemiology , Child , Female , Male , Substance-Related Disorders/epidemiology , Domestic Violence/statistics & numerical data , Domestic Violence/psychology , Adolescent , Child, Preschool , Cohort Studies , Mental Disorders/epidemiology , Parents/psychology , Latent Class Analysis , Mental Health/statistics & numerical data , Data Analysis , Secondary Data AnalysisABSTRACT
Statistical regression models are used for predicting outcomes based on the values of some predictor variables or for describing the association of an outcome with predictors. With a data set at hand, a regression model can be easily fit with standard software packages. This bears the risk that data analysts may rush to perform sophisticated analyses without sufficient knowledge of basic properties, associations in and errors of their data, leading to wrong interpretation and presentation of the modeling results that lacks clarity. Ignorance about special features of the data such as redundancies or particular distributions may even invalidate the chosen analysis strategy. Initial data analysis (IDA) is prerequisite to regression analyses as it provides knowledge about the data needed to confirm the appropriateness of or to refine a chosen model building strategy, to interpret the modeling results correctly, and to guide the presentation of modeling results. In order to facilitate reproducibility, IDA needs to be preplanned, an IDA plan should be included in the general statistical analysis plan of a research project, and results should be well documented. Biased statistical inference of the final regression model can be minimized if IDA abstains from evaluating associations of outcome and predictors, a key principle of IDA. We give advice on which aspects to consider in an IDA plan for data screening in the context of regression modeling to supplement the statistical analysis plan. We illustrate this IDA plan for data screening in an example of a typical diagnostic modeling project and give recommendations for data visualizations.
Subject(s)
Models, Statistical , Humans , Regression Analysis , Data Interpretation, Statistical , Multivariate Analysis , Reproducibility of Results , Software , Data AnalysisABSTRACT
Background: The COVID-19 pandemic has revealed significant challenges in disease forecasting and in developing a public health response, emphasizing the need to manage missing data from various sources in making accurate forecasts. Objective: We aimed to show how handling missing data can affect estimates of the COVID-19 incidence rate (CIR) in different pandemic situations. Methods: This study used data from the COVID-19/SARS-CoV-2 surveillance system at the National Institute of Hygiene and Epidemiology, Vietnam. We separated the available data set into 3 distinct periods: zero COVID-19, transition, and new normal. We randomly removed 5% to 30% of data that were missing completely at random, with a break of 5% at each time point in the variable daily caseload of COVID-19. We selected 7 analytical methods to assess the effects of handling missing data and calculated statistical and epidemiological indices to measure the effectiveness of each method. Results: Our study examined missing data imputation performance across 3 study time periods: zero COVID-19 (n=3149), transition (n=1290), and new normal (n=9288). Imputation analyses showed that K-nearest neighbor (KNN) had the lowest mean absolute percentage change (APC) in CIR across the range (5% to 30%) of missing data. For instance, with 15% missing data, KNN resulted in 10.6%, 10.6%, and 9.7% average bias across the zero COVID-19, transition, and new normal periods, compared to 39.9%, 51.9%, and 289.7% with the maximum likelihood method. The autoregressive integrated moving average model showed the greatest mean APC in the mean number of confirmed cases of COVID-19 during each COVID-19 containment cycle (CCC) when we imputed the missing data in the zero COVID-19 period, rising from 226.3% at the 5% missing level to 6955.7% at the 30% missing level. Imputing missing data with median imputation methods had the lowest bias in the average number of confirmed cases in each CCC at all levels of missing data. In detail, in the 20% missing scenario, while median imputation had an average bias of 16.3% for confirmed cases in each CCC, which was lower than the KNN figure, maximum likelihood imputation showed a bias on average of 92.4% for confirmed cases in each CCC, which was the highest figure. During the new normal period in the 25% and 30% missing data scenarios, KNN imputation had average biases for CIR and confirmed cases in each CCC ranging from 21% to 32% for both, while maximum likelihood and moving average imputation showed biases on average above 250% for both CIR and confirmed cases in each CCC. Conclusions: Our study emphasizes the importance of understanding that the specific imputation method used by investigators should be tailored to the specific epidemiological context and data collection environment to ensure reliable estimates of the CIR.
Subject(s)
COVID-19 , Humans , COVID-19/epidemiology , Incidence , Vietnam/epidemiology , Data Analysis , Data Interpretation, Statistical , Pandemics , Secondary Data AnalysisABSTRACT
Previous correlative and modeling approaches indicate influences of environmental factors on COVID-19 spread through atmospheric conditions' impact on virus survival and transmission or host susceptibility. However, causal connections from environmental factors to the pandemic, mediated by human mobility, received less attention. We use the technique of Convergent Cross Mapping to identify the causal connections, beyond correlation at the country level, between pairs of variables associated with weather conditions, human mobility, and the number of COVID-19 cases for 32 European states. Here, we present data-based evidence that the relatively reduced number of cases registered in Northern Europe is related to the causal impact of precipitation on people's decision to spend more time at home and that the relatively large number of cases observed in Southern Europe is linked to people's choice to spend time outdoors during warm days. We also emphasize the channels of the significant impact of the pandemic on human mobility. The weather-human mobility connections inferred here are relevant not only for COVID-19 spread but also for any other virus transmitted through human interactions. These results may help authorities and public health experts contain possible future waves of the COVID-19 pandemic or limit the threats of similar human-to-human transmitted viruses.
Subject(s)
COVID-19 , SARS-CoV-2 , Weather , COVID-19/epidemiology , COVID-19/transmission , COVID-19/virology , Humans , Europe/epidemiology , SARS-CoV-2/isolation & purification , SARS-CoV-2/pathogenicity , Pandemics , Data AnalysisABSTRACT
BACKGROUND: Large language models including GPT-4 (OpenAI) have opened new avenues in health care and qualitative research. Traditional qualitative methods are time-consuming and require expertise to capture nuance. Although large language models have demonstrated enhanced contextual understanding and inferencing compared with traditional natural language processing, their performance in qualitative analysis versus that of humans remains unexplored. OBJECTIVE: We evaluated the effectiveness of GPT-4 versus human researchers in qualitative analysis of interviews with patients with adult-acquired buried penis (AABP). METHODS: Qualitative data were obtained from semistructured interviews with 20 patients with AABP. Human analysis involved a structured 3-stage process-initial observations, line-by-line coding, and consensus discussions to refine themes. In contrast, artificial intelligence (AI) analysis with GPT-4 underwent two phases: (1) a naïve phase, where GPT-4 outputs were independently evaluated by a blinded reviewer to identify themes and subthemes and (2) a comparison phase, where AI-generated themes were compared with human-identified themes to assess agreement. We used a general qualitative description approach. RESULTS: The study population (N=20) comprised predominantly White (17/20, 85%), married (12/20, 60%), heterosexual (19/20, 95%) men, with a mean age of 58.8 years and BMI of 41.1 kg/m2. Human qualitative analysis identified "urinary issues" in 95% (19/20) and GPT-4 in 75% (15/20) of interviews, with the subtheme "spray or stream" noted in 60% (12/20) and 35% (7/20), respectively. "Sexual issues" were prominent (19/20, 95% humans vs 16/20, 80% GPT-4), although humans identified a wider range of subthemes, including "pain with sex or masturbation" (7/20, 35%) and "difficulty with sex or masturbation" (4/20, 20%). Both analyses similarly highlighted "mental health issues" (11/20, 55%, both), although humans coded "depression" more frequently (10/20, 50% humans vs 4/20, 20% GPT-4). Humans frequently cited "issues using public restrooms" (12/20, 60%) as impacting social life, whereas GPT-4 emphasized "struggles with romantic relationships" (9/20, 45%). "Hygiene issues" were consistently recognized (14/20, 70% humans vs 13/20, 65% GPT-4). Humans uniquely identified "contributing factors" as a theme in all interviews. There was moderate agreement between human and GPT-4 coding (κ=0.401). Reliability assessments of GPT-4's analyses showed consistent coding for themes including "body image struggles," "chronic pain" (10/10, 100%), and "depression" (9/10, 90%). Other themes like "motivation for surgery" and "weight challenges" were reliably coded (8/10, 80%), while less frequent themes were variably identified across multiple iterations. CONCLUSIONS: Large language models including GPT-4 can effectively identify key themes in analyzing qualitative health care data, showing moderate agreement with human analysis. While human analysis provided a richer diversity of subthemes, the consistency of AI suggests its use as a complementary tool in qualitative research. With AI rapidly advancing, future studies should iterate analyses and circumvent token limitations by segmenting data, furthering the breadth and depth of large language model-driven qualitative analyses.
Subject(s)
Qualitative Research , Humans , Male , Adult , Middle Aged , Data Analysis , Research Personnel/psychology , Research Personnel/statistics & numerical data , AgedABSTRACT
Glial scar formation represents a fundamental response to central nervous system (CNS) injuries. It is mainly characterized by a well-defined spatial rearrangement of reactive astrocytes and microglia. The mechanisms underlying glial scar formation have been extensively studied, yet quantitative descriptors of the spatial arrangement of reactive glial cells remain limited. Here, we present a novel approach using point pattern analysis (PPA) and topological data analysis (TDA) to quantify spatial patterns of reactive glial cells after experimental ischemic stroke in mice. We provide open and reproducible tools using R and Julia to quantify spatial intensity, cell covariance and conditional distribution, cell-to-cell interactions, and short/long-scale arrangement, which collectively disentangle the arrangement patterns of the glial scar. This approach unravels a substantial divergence in the distribution of GFAP+ and IBA1+ cells after injury that conventional analysis methods cannot fully characterize. PPA and TDA are valuable tools for studying the complex spatial arrangement of reactive glia and other nervous cells following CNS injuries and have potential applications for evaluating glial-targeted restorative therapies.
Subject(s)
Astrocytes , Cicatrix , Neuroglia , Animals , Mice , Cicatrix/pathology , Neuroglia/pathology , Astrocytes/pathology , Microglia/pathology , Ischemic Stroke/pathology , Data Analysis , Disease Models, Animal , Male , Glial Fibrillary Acidic Protein/metabolism , Mice, Inbred C57BLSubject(s)
Heart Arrest , Humans , Sweden/epidemiology , Heart Arrest/epidemiology , Incidence , Female , Male , Aged , Middle Aged , Adult , Aged, 80 and over , Data AnalysisABSTRACT
Within the realm of health care quality assessment, quality assurance and safety grading systems play a vital role in gauging hospital performance and communicating results to the general public. The primary objective of this review is to analyze the hospitals in California through the lens of Leapfrog Safety Grades and discuss the complex interplay of geographical location, hospital size, and larger system affiliation status. Leapfrog Safety Grades, hospital characteristics, and geographic information were collected. Hospitals were categorized by geographic region, size, rural/urban classification, and larger system affiliation status. Of the 284 hospitals included in the study, 95 were given a grade of A, 68 given a grade of B, 93 given a grade of C, 23 given a grade of D, 2 given a grade of F, and 3 were not graded. The vast majority of hospitals in California were classified as urban, with 183 falling under this category. The average number of hospital beds and SD was 227 ± 47.57. On average, hospitals that received a grade of D were significantly smaller in size than those that received a grade of A, while hospitals that received a grade of B or C were similar in size. A total of 107 hospitals were affiliated with a larger health care system. About 70% of hospitals affiliated with a system received an A or B grade, while 50% of unaffiliated hospitals received an A or B grade. Results of this study demonstrate a need for improving health care access and quality in medically underserved urban and rural areas. Hospitals affiliated with a larger health care system received higher grades than unaffiliated hospitals, suggesting that affiliation may also play a role in the implementation and mitigation of factors that contribute to Leapfrog Safety Grades.
Subject(s)
Hospitals , Patient Safety , California , Humans , Patient Safety/standards , Hospitals/standards , Quality Assurance, Health Care , Quality Indicators, Health Care , Data AnalysisABSTRACT
The article presents the experience of artificial intelligence application in research process. The article contains general information about basic concepts of machine learning (clustering and visualization), as well as considers more detaily an experience of clinical testing. The effectiveness of applying Data Analysis methods and means as one of the research stages is demonstrated on the example of a case on processing medical information using algorithms of machine learning: solving the problem on diagnostic value of the proposed indicator (FTF) for determining target age groups. Implementation of such approach of digital transformation improves the operational effectiveness of researches, as well as quality and availability of final technological products being developed - software for solving expert problems.