Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 1.305
Filter
1.
J Nucl Med ; 2024 Aug 01.
Article in English | MEDLINE | ID: mdl-39089812

ABSTRACT

Total metabolic tumor volume (TMTV) is prognostic in lymphoma. However, cutoff values for risk stratification vary markedly, according to the tumor delineation method used. We aimed to create a standardized TMTV benchmark dataset allowing TMTV to be tested and applied as a reproducible biomarker. Methods: Sixty baseline 18F-FDG PET/CT scans were identified with a range of disease distributions (20 follicular, 20 Hodgkin, and 20 diffuse large B-cell lymphoma). TMTV was measured by 12 nuclear medicine experts, each analyzing 20 cases split across subtypes, with each case processed by 3-4 readers. LIFEx or ACCURATE software was chosen according to reader preference. Analysis was performed stepwise: TMTV1 with automated preselection of lesions using an SUV of at least 4 and a volume of at least 3 cm3 with single-click removal of physiologic uptake; TMTV2 with additional removal of reactive bone marrow and spleen with single clicks; TMTV3 with manual editing to remove other physiologic uptake, if required; and TMTV4 with optional addition of lesions using mouse clicks with an SUV of at least 4 (no volume threshold). Results: The final TMTV (TMTV4) ranged from 8 to 2,288 cm3, showing excellent agreement among all readers in 87% of cases (52/60) with a difference of less than 10% or less than 10 cm3 In 70% of the cases, TMTV4 equaled TMTV1, requiring no additional reader interaction. Differences in the TMTV4 were exclusively related to reader interpretation of lesion inclusion or physiologic high-uptake region removal, not to the choice of software. For 5 cases, large TMTV differences (>25%) were due to disagreement about inclusion of diffuse splenic uptake. Conclusion: The proposed segmentation method enabled highly reproducible TMTV measurements, with minimal reader interaction in 70% of the patients. The inclusion or exclusion of diffuse splenic uptake requires definition of specific criteria according to lymphoma subtype. The publicly available proposed benchmark allows comparison of study results and could serve as a reference to test improvements using other segmentation approaches.

2.
Ecotoxicol Environ Saf ; 283: 116796, 2024 Aug 01.
Article in English | MEDLINE | ID: mdl-39094451

ABSTRACT

BACKGROUND: Previous studies reported that lead (Pb) exposure induced adverse health effects at high exposure concentrations, however, there have been limited data on sensitivity comparisons among different health outcomes at low blood Pb levels. OBJECTIVES: To compare sensitivity between blood parameters and a genotoxic biomarker among workers exposed to low blood Pb levels (< 20 µg/dl), and to estimate a benchmark dose (BMD). METHODS: Pb-exposed workers were recruited from a lead-acid storage battery plant. Their blood lead levels (BLLs) were measured. Blood parameters and micronuclei (MN) frequencies were determined. Multivariate linear or Poisson regression was used to analyze relationships between blood parameters or MN frequencies with BLLs. Two BMD software were used to calculate BMD and its 95 % lower confidence limit (BMDL) for BLLs. RESULTS: The median BLL for 611 workers was 10.44 µg/dl with the 25th and 75th percentile being 7.37 and 14.62 µg/dl among all participants. There were significantly negative correlations between blood parameters and BLLs. However, MN frequencies correlated positively with BLLs (all P<0.05). Results from the two BMD software revealed that the dichotomous model was superior to the continuous model, and the BMDL for BLL derived from red blood cell (RBC) was 15.11 µg/dl, from hemoglobin (HGB) was 8.50 µg/dl, from mean corpuscular hemoglobin (MCH) was 7.87 µg/dl, from mean corpuscular hemoglobin concentration (MCHC) was 3.98 µg/dl, from mean corpuscular volume (MCV) was 11.44 µg/dl, and from hematocrit (HCT) was 6.65 µg/dl. The conservative BMDL obtained from the MN data was 7.52 µg/dl. CONCLUSION: Our study shows that low dose Pb exposure caused decrease of blood parameters and increase of MN frequencies. The genotoxic biomarker was more sensitive than most blood parameters. BMDLs for BLL derived from MN frequencies and the red blood cell indicators should be considered as new occupational exposure limits. Our results suggest that MN assay can be considered as a part of occupational health examination items.

3.
J Cheminform ; 16(1): 92, 2024 Aug 02.
Article in English | MEDLINE | ID: mdl-39095917

ABSTRACT

Protein language models (PLMs) play a dominant role in protein representation learning. Most existing PLMs regard proteins as sequences of 20 natural amino acids. The problem with this representation method is that it simply divides the protein sequence into sequences of individual amino acids, ignoring the fact that certain residues often occur together. Therefore, it is inappropriate to view amino acids as isolated tokens. Instead, the PLMs should recognize the frequently occurring combinations of amino acids as a single token. In this study, we use the byte-pair-encoding algorithm and unigram to construct advanced residue vocabularies for protein sequence tokenization, and we have shown that PLMs pre-trained using these advanced vocabularies exhibit superior performance on downstream tasks when compared to those trained with simple vocabularies. Furthermore, we introduce PETA, a comprehensive benchmark for systematically evaluating PLMs. We find that vocabularies comprising 50 and 200 elements achieve optimal performance. Our code, model weights, and datasets are available at https://github.com/ginnm/ProteinPretraining . SCIENTIFIC CONTRIBUTION: This study introduces advanced protein sequence tokenization analysis, leveraging the byte-pair-encoding algorithm and unigram. By recognizing frequently occurring combinations of amino acids as single tokens, our proposed method enhances the performance of PLMs on downstream tasks. Additionally, we present PETA, a new comprehensive benchmark for the systematic evaluation of PLMs, demonstrating that vocabularies of 50 and 200 elements offer optimal performance.

4.
Chemosphere ; 364: 143010, 2024 Aug 03.
Article in English | MEDLINE | ID: mdl-39098349

ABSTRACT

Dosimetry modeling and point of departure (POD) estimation using in vitro data are essential for mechanism-based hazard identification and risk assessment. This study aimed to develop a putative adverse outcome pathway (AOP) for humidifier disinfectant (HD) substances used in South Korea through a systematic review and benchmark dose (BMD) modeling. We collected in vitro toxicological studies on HD substances, including polyhexamethylene guanidine hydrochloride (PHMG-HCl), PHMG phosphate (PHMG-p), a mixture of 5-chloro-2-methyl-4-isothiazolin-3-one and 2-methyl-4-isothiazolin-3-one (CMIT/MIT), CMIT, and MIT from scientific databases. A total of 193 sets of dose-response data were extracted from 34 articles reporting in vitro experimental results of HD toxicity. The risk of bias (RoB) in each study was assessed following the office of health assessment and translation (OHAT) guideline. The BMD of each HD substance at different toxicity endpoints was estimated using the US Environmental Protection Agency (EPA) BMD software (BMDS). Interspecies- or interorgan differences or most critical effects in the toxicity of the HD substances were analyzed using a 95% lower confidence limit of the BMD (BMDL). We found a critical molecular event and cells susceptible to each HD substance and constructed an AOP of PHMG-p- or CMIT/MIT-induced damage. Notably, PHMG-p induced ATP depletion at the lowest in vitro concentration, endoplasmic reticulum (ER) stress, epithelial-to-mesenchymal transition (EMT), inflammation, leading to fibrosis. CMIT/MIT enhanced mitochondrial reactive oxygen species (ROS) production, oxidative stress, mitochondrial dysfunction, resulting in cell death. Our approach will increase the current understanding of the effects of HD substances on human health and contribute to evidence-based risk assessment of these compounds.

5.
Stud Health Technol Inform ; 316: 1647-1651, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176526

ABSTRACT

Similarity and clustering tasks based on data extracted from electronic health records on the patient level suffer from the curse of dimensionality and the lack of inter-patient data comparability. Indeed, for many health institutions, there are many more variables, and ways of expressing those variables to represent patients than patients sharing the same set of data. To lower redundancy and increase interoperability one strategy is to map data to semantic-driven representations through medical knowledge graphs such as SNOMED-CT. However, patient similarity metrics based on this knowledge-graph information lack quantitative evaluation and comparisons with pure data-driven methods. The reasons are twofold, firstly, it is hard to conceptually assess and formalize a gold-standard similarity between patients resulting in poor inter-annotator agreement in qualitative evaluations. Secondly, the community has been lacking a clear benchmark to compare existing metrics developed by scientific communities coming from various fields such as ontology, data science, and medical informatics. This study proposes to leverage the known challenges of evaluating patient similarities by proposing SIMpat, a synthetic benchmark to quantitatively evaluate available metrics, based on controlled cohorts, which could later be used to assess their sensibility regarding aspects such as the sparsity of variables or specificities of patient disease patterns.


Subject(s)
Benchmarking , Electronic Health Records , Humans , Systematized Nomenclature of Medicine , Semantics
6.
Stud Health Technol Inform ; 316: 272-276, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176725

ABSTRACT

The task of Named Entity Recognition (NER) is central for leveraging the content of clinical texts in observational studies. Indeed, texts contain a large part of the information available in Electronic Health Records (EHRs). However, clinical texts are highly heterogeneous between healthcare services and institutions, between countries and languages, making it hard to predict how existing tools may perform on a particular corpus. We compared four NER approaches on three French corpora and share our benchmarking pipeline in an open and easy-to-reuse manner, using the medkit Python library. We include in our pipelines fine-tuning operations with either one or several of the considered corpora. Our results illustrate the expected superiority of language models over a dictionary-based approach, and question the necessity of refining models already trained on biomedical texts. Beyond benchmarking, we believe sharing reusable and customizable pipelines for comparing fast-evolving Natural Language Processing (NLP) tools is a valuable contribution, since clinical texts themselves can hardly be shared for privacy concerns.


Subject(s)
Electronic Health Records , Natural Language Processing , France , Humans
7.
Stud Health Technol Inform ; 316: 601-605, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176814

ABSTRACT

Generative Large Language Models (LLMs) have become ubiquitous in various fields, including healthcare and medicine. Consequently, there is growing interest in leveraging LLMs for medical applications, leading to the emergence of novel models daily. However, evaluation and benchmarking frameworks for LLMs are scarce, particularly those tailored for medical French. To address this gap, we introduce a minimal benchmark consisting of 114 open questions designed to assess the medical capabilities of LLMs in French. The proposed benchmark encompasses a wide range of medical domains, reflecting real-world clinical scenarios' complexity. A preliminary validation involved testing seven widely used LLMs with a parameter size of 7 billion. Results revealed significant variability in performance, emphasizing the importance of rigorous evaluation before deploying LLMs in medical settings. In conclusion, we present a novel and valuable resource for rapidly evaluating LLMs in medical French. By promoting greater accountability and standardization, this benchmark has the potential to enhance trustworthiness and utility in harnessing LLMs for medical applications.


Subject(s)
Benchmarking , Humans , France
8.
Brief Bioinform ; 25(5)2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39154193

ABSTRACT

Cell segmentation is a fundamental task in analyzing biomedical images. Many computational methods have been developed for cell segmentation and instance segmentation, but their performances are not well understood in various scenarios. We systematically evaluated the performance of 18 segmentation methods to perform cell nuclei and whole cell segmentation using light microscopy and fluorescence staining images. We found that general-purpose methods incorporating the attention mechanism exhibit the best overall performance. We identified various factors influencing segmentation performances, including image channels, choice of training data, and cell morphology, and evaluated the generalizability of methods across image modalities. We also provide guidelines for choosing the optimal segmentation methods in various real application scenarios. We developed Seggal, an online resource for downloading segmentation models already pre-trained with various tissue and cell types, substantially reducing the time and effort for training cell segmentation models.


Subject(s)
Image Processing, Computer-Assisted , Humans , Image Processing, Computer-Assisted/methods , Computational Biology/methods , Algorithms , Cell Nucleus
9.
Brief Bioinform ; 25(5)2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39120646

ABSTRACT

Cell-type annotation is a critical step in single-cell data analysis. With the development of numerous cell annotation methods, it is necessary to evaluate these methods to help researchers use them effectively. Reference datasets are essential for evaluation, but currently, the cell labels of reference datasets mainly come from computational methods, which may have computational biases and may not reflect the actual cell-type outcomes. This study first constructed an experimentally labeled immune cell-subtype single-cell dataset of the same batch and systematically evaluated 18 cell annotation methods. We assessed those methods under five scenarios, including intra-dataset validation, immune cell-subtype validation, unsupervised clustering, inter-dataset annotation, and unknown cell-type prediction. Accuracy and ARI were evaluation metrics. The results showed that SVM, scBERT, and scDeepSort were the best-performing supervised methods. Seurat was the best-performing unsupervised clustering method, but it couldn't fully fit the actual cell-type distribution. Our results indicated that experimentally labeled immune cell-subtype datasets revealed the deficiencies of unsupervised clustering methods and provided new dataset support for supervised methods.


Subject(s)
Single-Cell Analysis , Single-Cell Analysis/methods , Humans , Cluster Analysis , Computational Biology/methods , Molecular Sequence Annotation , RNA-Seq/methods , Single-Cell Gene Expression Analysis
10.
Article in English | MEDLINE | ID: mdl-39122095

ABSTRACT

BACKGROUND AND PURPOSE: STereotactic Arrhythmia Radioablation (STAR) showed promising results in patients with refractory ventricular tachycardia (VT). However, clinical data is scarce and heterogeneous. The STOPSTORM.eu consortium was established to investigate and harmonize STAR in Europe. The primary goal of this benchmark study was to investigate current treatment planning practice within the STOPSTORM project as a baseline for future harmonization. METHODS: Planning target volumes (PTV) overlapping extra-cardiac organs-at-risk and/or cardiac substructures were generated for three STAR cases. Participating centers were asked to create single fraction treatment plans with 25 Gy dose prescription based on in-house clinical practice. All treatment plans were reviewed by an expert panel and quantitative crowd knowledge-based analysis was performed with independent software using descriptive statistics for ICRU report 91 relevant parameters and crowd dose-volume-histograms. Thereafter, treatment planning consensus statements were established using a dual-stage voting process. RESULTS: Twenty centers submitted 67 treatment plans for this study. In most plans (75%) Intensity Modulated Arc Therapy (IMAT) with 6 MV flattening-filter-free beams was used. Dose prescription was mainly based on PTV D95% (49%) or D96-100% (19%). Many participants preferred to spare close extra-cardiac organs-at-risk (75%) and cardiac substructures (50%) by PTV coverage reduction. PTV D0.035cm3 ranged 25.5-34.6 Gy, demonstrating a large variety of dose inhomogeneity. Estimated treatment times without motion compensation or setup ranged 2-80 minutes. For the consensus statements, strong agreement was reached for beam technique planning, dose calculation, prescription methods and trade-offs between target and extra-cardiac critical structures. No agreement was reached on cardiac substructure dose limitations and on desired dose inhomogeneity in the target. CONCLUSION: This STOPSTORM multi-center treatment planning benchmark study showed strong agreement on several aspects of STAR treatment planning, but also revealed disagreement on others. To standardize and harmonize STAR in the future, consensus statements were established, however clinical data is urgently needed for actionable guidelines for treatment planning.

11.
Med Image Anal ; 97: 103285, 2024 Jul 30.
Article in English | MEDLINE | ID: mdl-39116766

ABSTRACT

We introduce the largest abdominal CT dataset (termed AbdomenAtlas) of 20,460 three-dimensional CT volumes sourced from 112 hospitals across diverse populations, geographies, and facilities. AbdomenAtlas provides 673 K high-quality masks of anatomical structures in the abdominal region annotated by a team of 10 radiologists with the help of AI algorithms. We start by having expert radiologists manually annotate 22 anatomical structures in 5,246 CT volumes. Following this, a semi-automatic annotation procedure is performed for the remaining CT volumes, where radiologists revise the annotations predicted by AI, and in turn, AI improves its predictions by learning from revised annotations. Such a large-scale, detailed-annotated, and multi-center dataset is needed for two reasons. Firstly, AbdomenAtlas provides important resources for AI development at scale, branded as large pre-trained models, which can alleviate the annotation workload of expert radiologists to transfer to broader clinical applications. Secondly, AbdomenAtlas establishes a large-scale benchmark for evaluating AI algorithms-the more data we use to test the algorithms, the better we can guarantee reliable performance in complex clinical scenarios. An ISBI & MICCAI challenge named BodyMaps: Towards 3D Atlas of Human Body was launched using a subset of our AbdomenAtlas, aiming to stimulate AI innovation and to benchmark segmentation accuracy, inference efficiency, and domain generalizability. We hope our AbdomenAtlas can set the stage for larger-scale clinical trials and offer exceptional opportunities to practitioners in the medical imaging community. Codes, models, and datasets are available at https://www.zongweiz.com/dataset.

12.
J Comput Biol ; 2024 Aug 08.
Article in English | MEDLINE | ID: mdl-39117342

ABSTRACT

Recent technological advancements have enabled spatially resolved transcriptomic profiling but at a multicellular resolution that is more cost-effective. The task of cell type deconvolution has been introduced to disentangle discrete cell types from such multicellular spots. However, existing benchmark datasets for cell type deconvolution are either generated from simulation or limited in scale, predominantly encompassing data on mice and are not designed for human immuno-oncology. To overcome these limitations and promote comprehensive investigation of cell type deconvolution for human immuno-oncology, we introduce a large-scale spatial transcriptomic deconvolution benchmark dataset named SpatialCTD, encompassing 1.8 million cells and 12,900 pseudo spots from the human tumor microenvironment across the lung, kidney, and liver. In addition, SpatialCTD provides more realistic reference than those generated from single-cell RNA sequencing (scRNA-seq) data for most reference-based deconvolution methods. To utilize the location-aware SpatialCTD reference, we propose a graph neural network-based deconvolution method (i.e., GNNDeconvolver). Extensive experiments show that GNNDeconvolver often outperforms existing state-of-the-art methods by a substantial margin, without requiring scRNA-seq data. To enable comprehensive evaluations of spatial transcriptomics data from flexible protocols, we provide an online tool capable of converting spatial transcriptomic data from various platforms (e.g., 10× Visium, MERFISH, and sci-Space) into pseudo spots, featuring adjustable spot size. The SpatialCTD dataset and GNNDeconvolver implementation are available at https://github.com/OmicsML/SpatialCTD, and the online converter tool can be accessed at https://omicsml.github.io/SpatialCTD/.

13.
J Comput Chem ; 2024 Aug 12.
Article in English | MEDLINE | ID: mdl-39134305

ABSTRACT

The development of novel methods in solid-state quantum chemistry necessitates reliable reference data sets for their assessment. The most fundamental solid-state property of interest is the crystal structure, quantified by the lattice parameters. In the last decade, several studies were conducted to assess theoretical approaches based on the agreement of calculated lattice parameters with respect to experiment as a measure. However, most of these studies used a limited number of reference systems with high symmetry. The present work offers a more comprehensive reference benchmark denoted as Sol337LC, which consists of 337 inorganic compounds with 553 symmetry-inequivalent lattice parameters, representing every element of the periodic table for atomic numbers between 1 and 86, except noble gases, the radioactive elements and lanthanoids. The reference values were taken from earlier benchmarks and from measurements at very low temperature or extrapolation to 0 K. The experimental low-temperature lattice parameters were then corrected for zero-point energy effects via the quasi-harmonic approximation for direct comparison with quantum-chemical optimized structures. A selection of standard density functional approximations was assessed for their deviations from the experimental reference data. The calculations were performed with the crystal orbital program CRYSTAL23, applying optimized atom-centered basis sets of triple-zeta plus polarization quality. The SCAN functional family and the global hybrid functional PW1PW, augmented with the D3 dispersion correction, were found to provide closest agreement with the Sol337LC reference data.

14.
J Hazard Mater ; 478: 135527, 2024 Aug 14.
Article in English | MEDLINE | ID: mdl-39151363

ABSTRACT

The disposable paper cups (DPCs) release millions of microplastics (MPs) when used for hot beverages. However, the tissue-specific deposition and toxic effects of MPs and associated toxins remain largely unexplored, especially at daily consumption levels. We administered MPs and associated toxins extracted from leading brand DPCs to pregnant mice, revealing dose-responsive harmful effects on fetal development and maternal physiology. MPs were detected in all 13 examined tissues, with preferred depositions in the fetus, placenta, kidney, spleen, lung, and heart, contributing to impaired phenotypes. Brain tissues had the smallest MPs (90.35 % < 10 µm). A dose-responsive shift in the cecal microbiome from Firmicutes to Bacteroidetes was observed, coupled with enhanced biosynthesis of microbial fatty acids. A moderate consumption of 3.3 cups daily was sufficient to alter the cecal microbiome, global metabolic functions, and immune health, as reflected by tissue-specific transcriptomic analyses in maternal blood, placenta, and mammary glands, leading to neurodegenerative and miscarriage risks. Gene-based benchmark dose framework analysis suggested a safe exposure limit of 2 to 4 cups/day in pregnant mice. Our results highlight tissue-specific accumulation and metabolic and reproductive toxicities in mice at DPC consumption levels presumed non-hazardous, with potential health implications for pregnant women and fetuses.

15.
Arch Toxicol ; 2024 Aug 17.
Article in English | MEDLINE | ID: mdl-39153032

ABSTRACT

Mono-n-hexyl phthalate (MnHexP) is a primary metabolite of di-n-hexyl phthalate (DnHexP) and other mixed side-chain phthalates that was recently detected in urine samples from adults and children in Germany. DnHexP is classified as toxic for reproduction category 1B in Annex VI of Regulation (EC) 1272/2008 and listed in Annex XIV of the European chemical legislation REACH; thereby, its use requires an authorisation. Health-based guidance values for DnHexP are lacking and a full-scale risk assessment has not been carried out under REACH. The detection of MnHexP in urine samples raises questions about the sources of exposure and concerns of consumer safety. Here, we propose the calculation of a provisional oral tolerable daily intake value (TDI) of 63 µg/kg body weight/day for DnHexP and compare it to intake levels corresponding to levels of MnHexP found in urine. The resulting mean intake levels correspond to less than 0.2% of the TDI, and maximum levels to less than 5%. The TDI was derived by means of an approximate probabilistic analysis using the credible interval from benchmark dose modelling of published ex vivo data on reduced foetal testosterone production in rats. Thus, for the dose associated to a 20% reduction in testosterone production, a lower and upper credible interval of 14.9 and 30.0 mg/kg bw/day, respectively, was used. This is considered a conservative approach, since apical developmental endpoints (e.g. changed anogenital distance) were only observed at higher doses. In addition, we modelled various scenarios of the exposure to the precursor substance DnHexP from different consumer products, taking measured contamination levels into account, and estimated systemic exposure doses. Of the modelled scenarios including the application of sunscreen (as a lotion or pump spray), the use of lip balm, and the wearing of plastic sandals, and considering conservative assumptions, the use of DnHexP-contaminated sunscreen was highlighted as a major contributing factor. A hypothetical calculation using conservative assumptions for the latter resulted in a margin of safety in relation to the lower credible interval of 3267 and 1007 for adults and young children, respectively. Most importantly, it was found that only a fraction of the TDI is reached in all studied exposure scenarios. Thus, with regard to the reported DnHexP exposure, a health risk can be considered very unlikely.

16.
Genome Biol ; 25(1): 225, 2024 Aug 16.
Article in English | MEDLINE | ID: mdl-39152456

ABSTRACT

BACKGROUND: Single-cell chromatin accessibility assays, such as scATAC-seq, are increasingly employed in individual and joint multi-omic profiling of single cells. As the accumulation of scATAC-seq and multi-omics datasets continue, challenges in analyzing such sparse, noisy, and high-dimensional data become pressing. Specifically, one challenge relates to optimizing the processing of chromatin-level measurements and efficiently extracting information to discern cellular heterogeneity. This is of critical importance, since the identification of cell types is a fundamental step in current single-cell data analysis practices. RESULTS: We benchmark 8 feature engineering pipelines derived from 5 recent methods to assess their ability to discover and discriminate cell types. By using 10 metrics calculated at the cell embedding, shared nearest neighbor graph, or partition levels, we evaluate the performance of each method at different data processing stages. This comprehensive approach allows us to thoroughly understand the strengths and weaknesses of each method and the influence of parameter selection. CONCLUSIONS: Our analysis provides guidelines for choosing analysis methods for different datasets. Overall, feature aggregation, SnapATAC, and SnapATAC2 outperform latent semantic indexing-based methods. For datasets with complex cell-type structures, SnapATAC and SnapATAC2 are preferred. With large datasets, SnapATAC2 and ArchR are most scalable.


Subject(s)
Benchmarking , Chromatin , Single-Cell Analysis , Single-Cell Analysis/methods , Chromatin/genetics , Chromatin/metabolism , Humans , Computational Biology/methods
17.
Sci Total Environ ; 949: 175245, 2024 Nov 01.
Article in English | MEDLINE | ID: mdl-39098426

ABSTRACT

Accurate snow cover data is crucial for understanding climate change, managing water resources, and calibrating models. The MODIS (Moderate-resolution Imaging Spectroradiometer) and its cloud-free snow cover datasets are widely used, but they have not been systematically evaluated due to different benchmark data and evaluation parameters. Conventional methods using station observations as a ground truth suffer from underrepresentation and mismatches in temporal and spatial scales. This study established a scale-matched spatial benchmark dataset, compiling from 18,433 Landsat series and 11,172 Sentinel-2 images over two decades, totaling ∼1.86 billion samples and ∼320 million snow samples. We evaluated seven MODIS cloud-free snow cover datasets for seasons, elevation zones, land covers and subregions using this benchmark data. For the clear-sky part, NIEER_MODIS_SCE (MODIS snow cover extent product over China) performs best due to its use of optimal NDSI thresholds suitable for each land use type. This highlights the importance of regional customization in snow mapping algorithms, and it can be further improved in spring, forests and zone 1 by combining it with M*D10A1GL06. For the cloud removed part, one-step integrated spatiotemporal cloud removal datasets perform better than any other approach does. The second-best dataset is obtained from a simple but effective single temporal cloud removal method using nearby time information. For the whole dataset, the best NIEER_MODIS_SCE has an overall accuracy of 0.82 and snow retrieval accuracy of 84.56 %. It performs excellently in most settings but weakest in forests thus requiring more efficient strategies. This research provides new perspectives and methods for objectively assessing MODIS snow cover products and other relevant datasets. These methods can be readily extended to other regions and adapted to future satellite missions. And such findings may guide the selection of more valid snow cover data and the developing of even better snow detecting strategies.

18.
Gigascience ; 132024 Jan 02.
Article in English | MEDLINE | ID: mdl-39115959

ABSTRACT

BACKGROUND: Sequencing of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) RNA from wastewater samples has emerged as a valuable tool for detecting the presence and relative abundances of SARS-CoV-2 variants in a community. By analyzing the viral genetic material present in wastewater, researchers and public health authorities can gain early insights into the spread of virus lineages and emerging mutations. Constructing reference datasets from known SARS-CoV-2 lineages and their mutation profiles has become state-of-the-art for assigning viral lineages and their relative abundances from wastewater sequencing data. However, selecting reference sequences or mutations directly affects the predictive power. RESULTS: Here, we show the impact of a mutation- and sequence-based reference reconstruction for SARS-CoV-2 abundance estimation. We benchmark 3 datasets: (i) synthetic "spike-in"' mixtures; (ii) German wastewater samples from early 2021, mainly comprising Alpha; and (iii) samples obtained from wastewater at an international airport in Germany from the end of 2021, including first signals of Omicron. The 2 approaches differ in sublineage detection, with the marker mutation-based method, in particular, being challenged by the increasing number of mutations and lineages. However, the estimations of both approaches depend on selecting representative references and optimized parameter settings. By performing parameter escalation experiments, we demonstrate the effects of reference size and alternative allele frequency cutoffs for abundance estimation. We show how different parameter settings can lead to different results for our test datasets and illustrate the effects of virus lineage composition of wastewater samples and references. CONCLUSIONS: Our study highlights current computational challenges, focusing on the general reference design, which directly impacts abundance allocations. We illustrate advantages and disadvantages that may be relevant for further developments in the wastewater community and in the context of defining robust quality metrics.


Subject(s)
COVID-19 , Mutation , SARS-CoV-2 , Wastewater , SARS-CoV-2/genetics , SARS-CoV-2/isolation & purification , Wastewater/virology , Humans , COVID-19/virology , COVID-19/epidemiology , RNA, Viral/genetics , Genome, Viral
19.
Heliyon ; 10(14): e34326, 2024 Jul 30.
Article in English | MEDLINE | ID: mdl-39108910

ABSTRACT

This article introduces an innovative application of the Enhanced Gorilla Troops Algorithm (EGTA) in addressing engineering challenges related to the allocation of Thyristor Controlled Series Capacitors (TCSC) in power grids. Drawing inspiration from gorilla group behaviors, EGTA incorporates various methods, such as relocation to new areas, movement towards other gorillas, migration to specific locations, following the silverback, and engaging in competitive interactions for adult females. Enhancements to EGTA involve support for the exploitation and the exploration, respectively, through two additional strategies of periodic Tangent Flight Operator (TFO), and Fitness-based Crossover Strategy (FCS). The paper initially evaluates the effectiveness of EGTA by comparing it to the original GTA using numerical CEC 2017 single-objective benchmarks. Additionally, various recent optimizers are scrutinized. Subsequently, the suitability of the proposed EGTA for the allocation of TCSC apparatuses in transmission power systems is assessed through simulations on two IEEE power grids of 30 and 57 buses, employing various TCSC apparatus quantities. A comprehensive comparison is conducted between EGTA, GTA, and several other prevalent techniques in the literature for all applications. According to the average attained losses, the presented EGTA displays notable reductions in power losses for both the first and second systems when compared to the original GTA. Specifically, for the first system, the proposed EGTA achieves reductions of 1.659 %, 2.545 %, and 4.6 % when optimizing one, two, and three TCSC apparatuses, respectively. Similarly, in the second system, the suggested EGTA achieves reductions of 6.096 %, 7.107 %, and 4.62 %, respectively, when compared to the original GTA's findings considering one, two, and three TCSC apparatuses. The findings underscore the superior effectiveness and efficiency of the proposed EGTA over both the original GTA and several other contemporary systems.

20.
bioRxiv ; 2024 Aug 07.
Article in English | MEDLINE | ID: mdl-39149320

ABSTRACT

The quantification of cardiac strains as structural indices of cardiac function has a growing prevalence in clinical diagnosis. However, the highly heterogeneous four-dimensional (4D) cardiac motion challenges accurate "regional" strain quantification and leads to sizable differences in the estimated strains depending on the imaging modality and post-processing algorithm, limiting the translational potential of strains as incremental biomarkers of cardiac dysfunction. There remains a crucial need for a feasible benchmark that successfully replicates complex 4D cardiac kinematics to determine the reliability of strain calculation algorithms. In this study, we propose an in-silico heart phantom derived from finite element (FE) simulations to validate the quantification of 4D regional strains. First, as a proof-of-concept exercise, we created synthetic magnetic resonance (MR) images for a hollow thick-walled cylinder under pure torsion with an exact solution and demonstrated that "ground-truth" values can be recovered for the twist angle, which is also a key kinematic index in the heart. Next, we used mouse-specific FE simulations of cardiac kinematics to synthesize dynamic MR images by sampling various sectional planes of the left ventricle (LV). Strains were calculated using our recently developed non-rigid image registration (NRIR) framework in both problems. Moreover, we studied the effects of image quality on distorting regional strain calculations by conducting in-silico experiments for various LV configurations. Our studies offer a rigorous and feasible tool to standardize regional strain calculations to improve their clinical impact as incremental biomarkers.

SELECTION OF CITATIONS
SEARCH DETAIL