Search | Brasil - Virtual Health Library

1.

Integrated approach to generate artificial samples with low tumor fraction for somatic variant calling benchmarking.

Sergi, Aldo; Beltrame, Luca; Marchini, Sergio; Masseroli, Marco.

BMC Bioinformatics ; 25(1): 180, 2024 May 08.

Article in English | MEDLINE | ID: mdl-38720249

ABSTRACT

BACKGROUND: High-throughput sequencing (HTS) has become the gold standard approach for variant analysis in cancer research. However, somatic variants may occur at low fractions due to contamination from normal cells or tumor heterogeneity; this poses a significant challenge for standard HTS analysis pipelines. The problem is exacerbated in scenarios with minimal tumor DNA, such as circulating tumor DNA in plasma. Assessing sensitivity and detection of HTS approaches in such cases is paramount, but time-consuming and expensive: specialized experimental protocols and a sufficient quantity of samples are required for processing and analysis. To overcome these limitations, we propose a new computational approach specifically designed for the generation of artificial datasets suitable for this task, simulating ultra-deep targeted sequencing data with low-fraction variants and demonstrating their effectiveness in benchmarking low-fraction variant calling. RESULTS: Our approach enables the generation of artificial raw reads that mimic real data without relying on pre-existing data by using NEAT, a fine-grained read simulator that generates artificial datasets using models learned from multiple different datasets. Then, it incorporates low-fraction variants to simulate somatic mutations in samples with minimal tumor DNA content. To prove the suitability of the created artificial datasets for low-fraction variant calling benchmarking, we used them as ground truth to evaluate the performance of widely-used variant calling algorithms: they allowed us to define tuned parameter values of major variant callers, considerably improving their detection of very low-fraction variants. CONCLUSIONS: Our findings highlight both the pivotal role of our approach in creating adequate artificial datasets with low tumor fraction, facilitating rapid prototyping and benchmarking of algorithms for such dataset type, as well as the important need of advancing low-fraction variant calling techniques.

Subject(s)

Benchmarking , High-Throughput Nucleotide Sequencing , Neoplasms , High-Throughput Nucleotide Sequencing/methods , Humans , Neoplasms/genetics , Mutation , Algorithms , DNA, Neoplasm/genetics , Sequence Analysis, DNA/methods , Computational Biology/methods

2.

Comparison of Full-Time Equivalent and Clinic Time Labor Input Measures in Productivity Metrics.

Saeed, Iman; Barr, Kyle; Palani, Sivagaminathan; Shafer, Paul; Pizer, Steven.

J Healthc Manag ; 69(3): 178-189, 2024.

Article in English | MEDLINE | ID: mdl-38728544

ABSTRACT

GOAL: A lack of improvement in productivity in recent years may be the result of suboptimal measurement of productivity. Hospitals and clinics benefit from external benchmarks that allow assessment of clinical productivity. Work relative value units have long served as a common currency for this purpose. Productivity is determined by comparing work relative value units to full-time equivalents (FTEs), but FTEs do not have a universal or standardized definition, which could cause problems. We propose a new clinical labor input measure-"clinic time"-as a substitute for using the reported measure of FTEs. METHODS: In this observational validation study, we used data from a cluster randomized trial to compare FTE with clinic time. We compared these two productivity measures graphically. For validation, we estimated two separate ordinary least squares (OLS) regression models. To validate and simultaneously adjust for endogeneity, we used instrumental variables (IV) regression with the proportion of days in a pay period that were federal holidays as an instrument. We used productivity data collected between 2018 and 2020 from Veterans Health Administration (VA) cardiology and orthopedics providers as part of a 2-year cluster randomized trial of medical scribes mandated by the VA Maintaining Internal Systems and Strengthening Integrated Outside Networks (MISSION) Act of 2018. PRINCIPAL FINDINGS: Our cohort included 654 unique providers. For both productivity variables, the values for patients per clinic day were consistently higher than those for patients per day per FTE. To validate these measures, we estimated separate OLS and IV regression models, predicting wait times from the two productivity measures. The slopes from the two productivity measures were positive and small in magnitude with OLS, but negative and large in magnitude with IV regression. The magnitude of the slope for patients per clinic day was much larger than the slope for patients per day per FTE. Current metrics that rely on FTE data may suffer from self-report bias and low reporting frequency. Using clinic time as an alternative is an effective way to mitigate these biases. PRACTICAL APPLICATIONS: Measuring productivity accurately is essential because provider productivity plays an important role in facilitating clinic operations outcomes. Most importantly, tracking a more valid productivity metric is a concrete, cost-effective management tactic to improve the provision of care in the long term.

Subject(s)

Efficiency, Organizational , Humans , United States , Efficiency , United States Department of Veterans Affairs , Benchmarking , Female , Relative Value Scales , Male

3.

Extremes of Emergency Department Boarding are Associated With Poorer Financial Performance Among Hospitals.

Napoli, Anthony M; Ali, Shihab; Baird, Janette; Shanin, Dan; Jouriles, Nick.

J Healthc Manag ; 69(3): 219-230, 2024.

Article in English | MEDLINE | ID: mdl-38728547

ABSTRACT

GOAL: Boarding emergency department (ED) patients is associated with reductions in quality of care, patient safety and experience, and ED operational efficiency. However, ED boarding is ultimately reflective of inefficiencies in hospital capacity management. The ability of a hospital to accommodate variability in patient flow presumably affects its financial performance, but this relationship is not well studied. We investigated the relationship between ED boarding and hospital financial performance measures. Our objective was to see if there was an association between key financial measures of business performance and limitations in patient progression efficiency, as evidenced by ED boarding. METHODS: Cross-sectional ED operational data were collected from the Emergency Department Benchmarking Alliance, a voluntarily self-reporting operational database that includes 54% of EDs in the United States. Freestanding EDs, pediatric EDs and EDs with missing boarding data were excluded. The key operational outcome variable was boarding time. We reviewed the financial information of these nonprofit institutions by accessing their Internal Revenue Service Form 990. We examined standard measures of financial performance, including return on equity, total margin, total asset turnover, and equity multiplier (EM). We studied these associations using quantile regressions of added ED volume, ED admission percentage, urban versus nonurban ED site location, trauma status, and percentage of the population receiving Medicare and Medicaid as covariates in the regression models. PRINCIPAL FINDINGS: Operational data were available for 892 EDs from 31 states. Of those, 127 reported a Form 990 in the year corresponding to the ED boarding measures. Median boarding time across EDs was 148 min (interquartile range [IQR]: 100-216). A significant relationship exists between boarding and the EM, along with a negative association with the hospital's total profit margin in the highest-performing hospitals (by profit margin percentage). After adjusting for the covariates in the regression model, we found that for every 10 min above 90 min of boarding, the mean EM for the top quartile increased from 245.8% to 249.5% (p < .001). In hospitals in the top 90th percentile of total margin, every 10 min beyond the median ED boarding interval led to a decrease in total margin of 0.24%. PRACTICAL APPLICATIONS: Using the largest available national registry of ED operational data and concordant nonprofit financial reports, higher boarding among the highest-profitability hospitals (i.e., top 10%) is associated with a drag on profit margin, while hospitals with the highest boarding are associated with the highest leverage (i.e., indicated by the EM). These relationships suggest an association between a key ED indicator of hospital capacity management and overall institutional financial performance.

Subject(s)

Efficiency, Organizational , Emergency Service, Hospital , Emergency Service, Hospital/statistics & numerical data , Emergency Service, Hospital/economics , Cross-Sectional Studies , United States , Humans , Efficiency, Organizational/economics , Benchmarking

4.

Dataset including whole blood gene expression profiles and matched leukocyte counts with utility for benchmarking cellular deconvolution pipelines.

O'Connell, Grant C.

BMC Genom Data ; 25(1): 45, 2024 May 07.

Article in English | MEDLINE | ID: mdl-38714942

ABSTRACT

OBJECTIVES: Cellular deconvolution is a valuable computational process that can infer the cellular composition of heterogeneous tissue samples from bulk RNA-sequencing data. Benchmark testing is a crucial step in the development and evaluation of new cellular deconvolution algorithms, and also plays a key role in the process of building and optimizing deconvolution pipelines for specific experimental applications. However, few in vivo benchmarking datasets exist, particularly for whole blood, which is the single most profiled human tissue. Here, we describe a unique dataset containing whole blood gene expression profiles and matched circulating leukocyte counts from a large cohort of human donors with utility for benchmarking cellular deconvolution pipelines. DATA DESCRIPTION: To produce this dataset, venous whole blood was sampled from 138 total donors recruited at an academic medical center. Genome-wide expression profiling was subsequently performed via next-generation RNA sequencing, and white blood cell differentials were collected in parallel using flow cytometry. The resultant final dataset contains donor-level expression data for over 45,000 protein coding and non-protein coding genes, as well as matched neutrophil, lymphocyte, monocyte, and eosinophil counts.

Subject(s)

Benchmarking , Humans , Leukocyte Count , Gene Expression Profiling/methods , Transcriptome , Sequence Analysis, RNA/methods , Leukocytes/metabolism , High-Throughput Nucleotide Sequencing , Algorithms

5.

Benchmarking the university campus food environment and exploring student perspectives about food insecurity and healthy eating: a case study from Australia.

Keat, Jemma; Dharmayani, Putu Novi Arfirsta; Mihrshahi, Seema.

BMC Public Health ; 24(1): 1245, 2024 May 06.

Article in English | MEDLINE | ID: mdl-38711106

ABSTRACT

OBJECTIVE: To benchmark the university food environment and explore students' experiences with food insecurity and healthy eating in order to inform interventions to improve access and affordability of healthy foods for university students. DESIGN: A food environment audit was conducted on the university campus using the Uni-Food tool from April to May 2022 and was comprised of three main components, university systems and governance, campus facilities and environment, and food retail outlets. A qualitative study design was also used to conduct focus groups and semi-structured interviews with students to explore key themes regarding their experiences with food insecurity and healthy eating. SETTING: Macquarie University, Australia. PARTICIPANTS: For the food environment audit 24 retail outlets on campus and for the qualitative component 29 domestic and international students enrolled at Macquarie University. RESULTS: The university only scored 27% in total for all components in the food environment audit. The results showed the need for better governance and leadership of the food environment. The qualitative component suggested that the main barriers to accessing healthy foods were related to availability, pricing, and knowledge of healthy foods. Future intervention ideas included free fruits and vegetables, food relief, discounts, improved self-catering facilities, education, and increased healthy food outlets. CONCLUSIONS: Improving governance measures related to healthy eating on campus are a core priority to strengthen the food environment and students identified pricing and availability as key issues. These findings will inform effective and feasible interventions to improve food security and healthy eating on campus.

Subject(s)

Benchmarking , Diet, Healthy , Food Insecurity , Qualitative Research , Students , Humans , Universities , Students/psychology , Students/statistics & numerical data , Diet, Healthy/psychology , Female , Male , Australia , Young Adult , Focus Groups , Adult , Organizational Case Studies , Food Supply/statistics & numerical data

6.

GRouNdGAN: GRN-guided simulation of single-cell RNA-seq data using causal generative adversarial networks.

Zinati, Yazdan; Takiddeen, Abdulrahman; Emad, Amin.

Nat Commun ; 15(1): 4055, 2024 May 14.

Article in English | MEDLINE | ID: mdl-38744843

ABSTRACT

We introduce GRouNdGAN, a gene regulatory network (GRN)-guided reference-based causal implicit generative model for simulating single-cell RNA-seq data, in silico perturbation experiments, and benchmarking GRN inference methods. Through the imposition of a user-defined GRN in its architecture, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating transcription factors (TFs). Training on six experimental reference datasets, we show that our model captures non-linear TF-gene dependencies and preserves gene identities, cell trajectories, pseudo-time ordering, and technical and biological noise, with no user manipulation and only implicit parameterization. GRouNdGAN can synthesize cells under new conditions to perform in silico TF knockout experiments. Benchmarking various GRN inference algorithms reveals that GRouNdGAN effectively bridges the existing gap between simulated and biological data benchmarks of GRN inference algorithms, providing gold standard ground truth GRNs and realistic cells corresponding to the biological system of interest.

Subject(s)

Algorithms , Computer Simulation , Gene Regulatory Networks , RNA-Seq , Single-Cell Analysis , Single-Cell Analysis/methods , RNA-Seq/methods , Humans , Transcription Factors/metabolism , Transcription Factors/genetics , Computational Biology/methods , Benchmarking , Sequence Analysis, RNA/methods , Single-Cell Gene Expression Analysis

7.

Assessment of Gene Set Enrichment Analysis using curated RNA-seq-based benchmarks.

Candia, Julián; Ferrucci, Luigi.

PLoS One ; 19(5): e0302696, 2024.

Article in English | MEDLINE | ID: mdl-38753612

ABSTRACT

Pathway enrichment analysis is a ubiquitous computational biology method to interpret a list of genes (typically derived from the association of large-scale omics data with phenotypes of interest) in terms of higher-level, predefined gene sets that share biological function, chromosomal location, or other common features. Among many tools developed so far, Gene Set Enrichment Analysis (GSEA) stands out as one of the pioneering and most widely used methods. Although originally developed for microarray data, GSEA is nowadays extensively utilized for RNA-seq data analysis. Here, we quantitatively assessed the performance of a variety of GSEA modalities and provide guidance in the practical use of GSEA in RNA-seq experiments. We leveraged harmonized RNA-seq datasets available from The Cancer Genome Atlas (TCGA) in combination with large, curated pathway collections from the Molecular Signatures Database to obtain cancer-type-specific target pathway lists across multiple cancer types. We carried out a detailed analysis of GSEA performance using both gene-set and phenotype permutations combined with four different choices for the Kolmogorov-Smirnov enrichment statistic. Based on our benchmarks, we conclude that the classic/unweighted gene-set permutation approach offered comparable or better sensitivity-vs-specificity tradeoffs across cancer types compared with other, more complex and computationally intensive permutation methods. Finally, we analyzed other large cohorts for thyroid cancer and hepatocellular carcinoma. We utilized a new consensus metric, the Enrichment Evidence Score (EES), which showed a remarkable agreement between pathways identified in TCGA and those from other sources, despite differences in cancer etiology. This finding suggests an EES-based strategy to identify a core set of pathways that may be complemented by an expanded set of pathways for downstream exploratory analysis. This work fills the existing gap in current guidelines and benchmarks for the use of GSEA with RNA-seq data and provides a framework to enable detailed benchmarking of other RNA-seq-based pathway analysis tools.

Subject(s)

Benchmarking , RNA-Seq , Humans , RNA-Seq/methods , Computational Biology/methods , Neoplasms/genetics , Databases, Genetic , Gene Expression Profiling/methods

8.

Prospective benchmarking of an observational analysis in the SWEDEHEART registry against the REDUCE-AMI randomized trial.

Matthews, Anthony A; Dahebreh, Issa J; MacDonald, Conor J; Lindahl, Bertil; Hofmann, Robin; Erlinge, David; Yndigegn, Troels; Berglund, Anita; Jernberg, Tomas; Hernán, Miguel A.

Eur J Epidemiol ; 39(4): 349-361, 2024 Apr.

Article in English | MEDLINE | ID: mdl-38717556

ABSTRACT

Prospective benchmarking of an observational analysis against a randomized trial increases confidence in the benchmarking process as it relies exclusively on aligning the protocol of the trial and the observational analysis, while the trials findings are unavailable. The Randomized Evaluation of Decreased Usage of Betablockers After Myocardial Infarction (REDUCE-AMI, ClinicalTrials.gov ID: NCT03278509) trial started recruitment in September 2017 and results are expected in 2024. REDUCE-AMI aimed to estimate the effect of long-term use of beta blockers on the risk of death and myocardial following a myocardial infarction with preserved left ventricular systolic ejection fraction. We specified the protocol of a target trial as similar as possible to that of REDUCE-AMI, then emulated the target trial using observational data from Swedish healthcare registries. Had everyone followed the treatment strategy as specified in the target trial protocol, the observational analysis estimated a reduction in the 5-year risk of death or myocardial infarction of 0.8 percentage points for beta blockers compared with no beta blockers; effects ranging from an absolute reduction of 4.5 percentage points to an increase of 2.8 percentage points in the risk of death or myocardial infarction were compatible with our data under conventional statistical criteria. Once results of REDUCE-AMI are published, we will compare the results of our observational analysis against those from the trial. If this prospective benchmarking is successful, it supports the credibility of additional analyses using these observational data, which can rapidly deliver answers to questions that could not be answered by the initial trial. If benchmarking proves unsuccessful, we will conduct a "postmortem" analysis to identify the reasons for the discrepancy. Prospective benchmarking shifts the investigator focus away from an endeavour to use observational data to obtain similar results as a completed randomized trial, to a systematic attempt to align the design and analysis of the trial and the observational analysis.

Subject(s)

Adrenergic beta-Antagonists , Benchmarking , Myocardial Infarction , Registries , Humans , Sweden , Prospective Studies , Adrenergic beta-Antagonists/therapeutic use , Female , Male , Aged , Randomized Controlled Trials as Topic , Middle Aged

9.

Assessing quality of hepato-pancreato-biliary surgery: nationwide benchmarking.

de Graaff, Michelle R; Hendriks, Tessa E; Wouters, Michel; Nielen, Mark; de Hingh, Ignace; Koerkamp, Bas Groot; van Santvoort, Hjalmar C; Busch, Olivier R; den Dulk, Marcel; Klaase, Joost M; van Zwet, Erik; Bonsing, Bert A; Grünhagen, Dirk J; Besselink, Marc G; Kok, Niels F M.

Br J Surg ; 111(5)2024 May 03.

Article in English | MEDLINE | ID: mdl-38747683

ABSTRACT

BACKGROUND: Clinical auditing is a powerful tool to evaluate and improve healthcare. Deviations from the expected quality of care are identified by benchmarking the results of individual hospitals using national averages. This study aimed to evaluate the use of quality indicators for benchmarking hepato-pancreato-biliary (HPB) surgery and when outlier hospitals could be identified. METHODS: A population-based study used data from two nationwide Dutch HPB audits (DHBA and DPCA) from 2014 to 2021. Sample size calculations determined the threshold (in percentage points) to identify centres as statistical outliers, based on current volume requirements (annual minimum of 20 resections) on a two-year period (2020-2021), covering mortality rate, failure to rescue (FTR), major morbidity rate and textbook/ideal outcome (TO) for minor liver resection (LR), major LR, pancreaticoduodenectomy (PD) and distal pancreatectomy (DP). RESULTS: In total, 10 963 and 7365 patients who underwent liver and pancreatic resection respectively were included. Benchmark and corresponding range of mortality rates were 0.6% (0 -3.2%) and 3.3% (0-16.7%) for minor and major LR, and 2.7% (0-7.0%) and 0.6% (0-4.2%) for PD and DP respectively. FTR rates were 5.4% (0-33.3%), 14.2% (0-100%), 7.5% (1.6%-28.5%) and 3.1% (0-14.9%). For major morbidity rate, corresponding rates were 9.8% (0-20.5%), 28.1% (0-47.1%), 36% (15.8%-58.3%) and 22.3% (5.2%-46.1%). For TO, corresponding rates were 73.6% (61.3%-94.4%), 54.1% (35.3-100), 46.8% (25.3%-59.4%) and 63.3% (30.7%-84.6%). Mortality rate thresholds indicating a significant outlier were 8.6% and 15.4% for minor and major LR and 14.2% and 8.6% for PD and DP. For FTR, these thresholds were 17.9%, 31.6%, 22.9% and 15.0%. For major morbidity rate, these thresholds were 26.1%, 49.7%, 57.9% and 52.9% respectively. For TO, lower thresholds were 52.5%, 32.5%, 25.8% and 41.4% respectively. Higher hospital volumes decrease thresholds to detect outliers. CONCLUSION: Current event rates and minimum volume requirements per hospital are too low to detect any meaningful between hospital differences in mortality rate and FTR. Major morbidity rate and TO are better candidates to use for benchmarking.

Subject(s)

Benchmarking , Quality Indicators, Health Care , Humans , Netherlands/epidemiology , Pancreatectomy/standards , Pancreatectomy/mortality , Male , Pancreaticoduodenectomy/standards , Pancreaticoduodenectomy/mortality , Hepatectomy/mortality , Hepatectomy/standards , Female , Middle Aged , Aged , Hospital Mortality

10.

Benchmarking of methods for DNA methylome deconvolution.

De Ridder, Kobe; Che, Huiwen; Leroy, Kaat; Thienpont, Bernard.

Nat Commun ; 15(1): 4134, 2024 May 16.

Article in English | MEDLINE | ID: mdl-38755121

ABSTRACT

Defining the number and abundance of different cell types in tissues is important for understanding disease mechanisms as well as for diagnostic and prognostic purposes. Typically, this is achieved by immunohistological analyses, cell sorting, or single-cell RNA-sequencing. Alternatively, cell-specific DNA methylome information can be leveraged to deconvolve cell fractions from a bulk DNA mixture. However, comprehensive benchmarking of deconvolution methods and modalities was not yet performed. Here we evaluate 16 deconvolution algorithms, developed either specifically for DNA methylome data or more generically. We assess the performance of these algorithms, and the effect of normalization methods, while modeling variables that impact deconvolution performance, including cell abundance, cell type similarity, reference panel size, method for methylome profiling (array or sequencing), and technical variation. We observe differences in algorithm performance depending on each these variables, emphasizing the need for tailoring deconvolution analyses. The complexity of the reference, marker selection method, number of marker loci and, for sequencing-based assays, sequencing depth have a marked influence on performance. By developing handles to select the optimal analysis configuration, we provide a valuable source of information for studies aiming to deconvolve array- or sequencing-based methylation data.

Subject(s)

Algorithms , Benchmarking , DNA Methylation , Epigenome , Humans , Sequence Analysis, DNA/methods , DNA/genetics , High-Throughput Nucleotide Sequencing/methods

11.

Salzburg Intensive Care database (SICdb): a detailed exploration and comparative analysis with MIMIC-IV.

Sadeghi, Sina; Hempel, Lars; Rodemund, Niklas; Kirsten, Toralf.

Sci Rep ; 14(1): 11438, 2024 05 20.

Article in English | MEDLINE | ID: mdl-38763952

ABSTRACT

The utilization of artificial intelligence (AI) in healthcare is on the rise, demanding increased accessibility to (public) medical data for benchmarking. The digitization of healthcare in recent years has facilitated medical data scientists' access to extensive hospital data, fostering AI-based research. A notable addition to this trend is the Salzburg Intensive Care database (SICdb), made publicly available in early 2023. Covering over 27 thousand intensive care admissions at the University Hospital Salzburg from 2013 to 2021, this dataset presents a valuable resource for AI-driven investigations. This article explores the SICdb and conducts a comparative analysis with the widely recognized Medical Information Mart for Intensive Care - version IV (MIMIC-IV) database. The comparison focuses on key aspects, emphasizing the availability and granularity of data provided by the SICdb, particularly vital signs and laboratory measurements. The analysis demonstrates that the SICdb offers more detailed information with higher data availability and temporal resolution for signal data, especially for vital signs, compared to the MIMIC-IV. This is advantageous for longitudinal studies of patients' health conditions in the intensive care unit. The SICdb provides a valuable resource for medical data scientists and researchers. The database offers comprehensive and diverse healthcare data in a European country, making it well suited for benchmarking and enhancing AI-based healthcare research. The importance of ongoing efforts to expand and make public datasets available for advancing AI applications in the healthcare domain is emphasized by the findings.

Subject(s)

Critical Care , Databases, Factual , Intensive Care Units , Humans , Artificial Intelligence , Male , Female , Aged , Middle Aged , Adult , Aged, 80 and over , Benchmarking , Adolescent

12.

Creating and validating the Fine-Grained Question Subjectivity Dataset (FQSD): A new benchmark for enhanced automatic subjective question answering systems.

Babaali, Marzieh; Fatemi, Afsaneh; Nematbakhsh, Mohammad Ali.

PLoS One ; 19(5): e0301696, 2024.

Article in English | MEDLINE | ID: mdl-38781237

ABSTRACT

In the domain of question subjectivity classification, there exists a need for detailed datasets that can foster advancements in Automatic Subjective Question Answering (ASQA) systems. Addressing the prevailing research gaps, this paper introduces the Fine-Grained Question Subjectivity Dataset (FQSD), which comprises 10,000 questions. The dataset distinguishes between subjective and objective questions and offers additional categorizations such as Subjective-types (Target, Attitude, Reason, Yes/No, None) and Comparison-form (Single, Comparative). Annotation reliability was confirmed via robust evaluation techniques, yielding a Fleiss's Kappa score of 0.76 and Pearson correlation values up to 0.80 among three annotators. We benchmarked FQSD against existing datasets such as (Yu, Zha, and Chua 2012), SubjQA (Bjerva 2020), and ConvEx-DS (Hernandez-Bocanegra 2021). Our dataset excelled in scale, linguistic diversity, and syntactic complexity, establishing a new standard for future research. We employed visual methodologies to provide a nuanced understanding of the dataset and its classes. Utilizing transformer-based models like BERT, XLNET, and RoBERTa for validation, RoBERTa achieved an outstanding F1-score of 97%, confirming the dataset's efficacy for the advanced subjectivity classification task. Furthermore, we utilized Local Interpretable Model-agnostic Explanations (LIME) to elucidate model decision-making, ensuring transparent and reliable model predictions in subjectivity classification tasks.

Subject(s)

Benchmarking , Humans , Benchmarking/methods , Reproducibility of Results

13.

A comprehensive benchmarking with interpretation and operational guidance for the hierarchy of topologically associating domains.

Xu, Jingxuan; Xu, Xiang; Huang, Dandan; Luo, Yawen; Lin, Lin; Bai, Xuemei; Zheng, Yang; Yang, Qian; Cheng, Yu; Huang, An; Shi, Jingyi; Bo, Xiaochen; Gu, Jin; Chen, Hebing.

Nat Commun ; 15(1): 4376, 2024 May 23.

Article in English | MEDLINE | ID: mdl-38782890

ABSTRACT

Topologically associating domains (TADs), megabase-scale features of chromatin spatial architecture, are organized in a domain-within-domain TAD hierarchy. Within TADs, the inner and smaller subTADs not only manifest cell-to-cell variability, but also precisely regulate transcription and differentiation. Although over 20 TAD callers are able to detect TAD, their usability in biomedicine is confined by a disagreement of outputs and a limit in understanding TAD hierarchy. We compare 13 computational tools across various conditions and develop a metric to evaluate the similarity of TAD hierarchy. Although outputs of TAD hierarchy at each level vary among callers, data resolutions, sequencing depths, and matrices normalization, they are more consistent when they have a higher similarity of larger TADs. We present comprehensive benchmarking of TAD hierarchy callers and operational guidance to researchers of life science researchers. Moreover, by simulating the mixing of different types of cells, we confirm that TAD hierarchy is generated not simply from stacking Hi-C heatmaps of heterogeneous cells. Finally, we propose an air conditioner model to decipher the role of TAD hierarchy in transcription.

Subject(s)

Benchmarking , Chromatin , Chromatin/chemistry , Humans , Computational Biology/methods , Software , Chromatin Assembly and Disassembly

14.

Spotless, a reproducible pipeline for benchmarking cell type deconvolution in spatial transcriptomics.

Sang-Aram, Chananchida; Browaeys, Robin; Seurinck, Ruth; Saeys, Yvan.

Elife ; 122024 May 24.

Article in English | MEDLINE | ID: mdl-38787371

ABSTRACT

Spatial transcriptomics (ST) technologies allow the profiling of the transcriptome of cells while keeping their spatial context. Since most commercial untargeted ST technologies do not yet operate at single-cell resolution, computational methods such as deconvolution are often used to infer the cell type composition of each sequenced spot. We benchmarked 11 deconvolution methods using 63 silver standards, 3 gold standards, and 2 case studies on liver and melanoma tissues. We developed a simulation engine called synthspot to generate silver standards from single-cell RNA-sequencing data, while gold standards are generated by pooling single cells from targeted ST data. We evaluated methods based on their performance, stability across different reference datasets, and scalability. We found that cell2location and RCTD are the top-performing methods, but surprisingly, a simple regression model outperforms almost half of the dedicated spatial deconvolution methods. Furthermore, we observe that the performance of all methods significantly decreased in datasets with highly abundant or rare cell types. Our results are reproducible in a Nextflow pipeline, which also allows users to generate synthetic data, run deconvolution methods and optionally benchmark them on their dataset (https://github.com/saeyslab/spotless-benchmark).

Subject(s)

Benchmarking , Gene Expression Profiling , Transcriptome , Humans , Gene Expression Profiling/methods , Single-Cell Analysis/methods , Software , Computational Biology/methods , Sequence Analysis, RNA/methods , Melanoma/genetics , Reproducibility of Results , Liver

15.

Experimental assessment of pollutant emissions from residential fuel cells and comparative benchmark analysis.

Paulus, N; Lemort, V.

J Environ Manage ; 359: 121017, 2024 May.

Article in English | MEDLINE | ID: mdl-38718602

ABSTRACT

Energy transition currently brings focus on fuel cell micro-combined heat and power (mCHP) systems for residential uses. The two main technologies already commercialized are the Proton Exchange Membrane Fuel Cells (PEMFCs) and Solid Oxide Fuel Cells (SOFCs). The pollutant emissions of one system of each technology have been tested with a portable probe both in laboratory and field-test configurations. In this paper, the nitrogen oxides (NOx), sulphur dioxide (SO2), and carbon monoxide (CO) emission levels are compared to other combustion technologies such as a recent Euro 6 diesel automotive vehicle, a classical gas condensing boiler, and a gas absorption heat pump. At last, a method of converting the concentration of pollutants (in ppm) measured by the sensors into pollutant intensity per unit of energy (in mg/kWh) is documented and reported. This allows for comparing the pollutant emissions levels with relevant literature, especially other studies conducted with other measuring sensors. Both tested residential fuel cell technologies fed by natural gas can be considered clean regarding SO2 and NOx emissions. The CO emissions can be considered quite low for the tested SOFC and even nil for the tested PEMFC. The biggest issue of natural gas fuel cell technologies still lies in the carbon dioxide (CO2) emissions associated with the fossil fuel they consume. The gas absorption heat pump however shows worse NOx and CO levels than the classical gas condensing boiler. At last, this study illustrates that the high level of hybridization between a fuel cell and a gas boiler may be responsible for unexpected ON/OFF cycling behaviours and therefore prevent both sub-systems from operating as optimally and reliably as they would have as standalone units.

Subject(s)

Air Pollutants , Nitrogen Oxides , Air Pollutants/analysis , Nitrogen Oxides/analysis , Carbon Monoxide/analysis , Sulfur Dioxide/analysis , Benchmarking , Vehicle Emissions/analysis , Environmental Monitoring/methods

16.

A comprehensive benchmarking of machine learning algorithms and dimensionality reduction methods for drug sensitivity prediction.

Eckhart, Lea; Lenhof, Kerstin; Rolli, Lisa-Marie; Lenhof, Hans-Peter.

Brief Bioinform ; 25(4)2024 May 23.

Article in English | MEDLINE | ID: mdl-38797968

ABSTRACT

A major challenge of precision oncology is the identification and prioritization of suitable treatment options based on molecular biomarkers of the considered tumor. In pursuit of this goal, large cancer cell line panels have successfully been studied to elucidate the relationship between cellular features and treatment response. Due to the high dimensionality of these datasets, machine learning (ML) is commonly used for their analysis. However, choosing a suitable algorithm and set of input features can be challenging. We performed a comprehensive benchmarking of ML methods and dimension reduction (DR) techniques for predicting drug response metrics. Using the Genomics of Drug Sensitivity in Cancer cell line panel, we trained random forests, neural networks, boosting trees and elastic nets for 179 anti-cancer compounds with feature sets derived from nine DR approaches. We compare the results regarding statistical performance, runtime and interpretability. Additionally, we provide strategies for assessing model performance compared with a simple baseline model and measuring the trade-off between models of different complexity. Lastly, we show that complex ML models benefit from using an optimized DR strategy, and that standard models-even when using considerably fewer features-can still be superior in performance.

Subject(s)

Algorithms , Antineoplastic Agents , Benchmarking , Machine Learning , Humans , Antineoplastic Agents/pharmacology , Antineoplastic Agents/therapeutic use , Neoplasms/drug therapy , Neoplasms/genetics , Neural Networks, Computer , Cell Line, Tumor

17.

A whole-slide foundation model for digital pathology from real-world data.

Xu, Hanwen; Usuyama, Naoto; Bagga, Jaspreet; Zhang, Sheng; Rao, Rajesh; Naumann, Tristan; Wong, Cliff; Gero, Zelalem; González, Javier; Gu, Yu; Xu, Yanbo; Wei, Mu; Wang, Wenhui; Ma, Shuming; Wei, Furu; Yang, Jianwei; Li, Chunyuan; Gao, Jianfeng; Rosemon, Jaylen; Bower, Tucker; Lee, Soohee; Weerasinghe, Roshanthi; Wright, Bill J; Robicsek, Ari; Piening, Brian; Bifulco, Carlo; Wang, Sheng; Poon, Hoifung.

Nature ; 630(8015): 181-188, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38778098

ABSTRACT

Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles1-3. Prior models have often resorted to subsampling a small portion of tiles for each slide, thus missing the important slide-level context4. Here we present Prov-GigaPath, a whole-slide pathology foundation model pretrained on 1.3 billion 256 × 256 pathology image tiles in 171,189 whole slides from Providence, a large US health network comprising 28 cancer centres. The slides originated from more than 30,000 patients covering 31 major tissue types. To pretrain Prov-GigaPath, we propose GigaPath, a novel vision transformer architecture for pretraining gigapixel pathology slides. To scale GigaPath for slide-level learning with tens of thousands of image tiles, GigaPath adapts the newly developed LongNet5 method to digital pathology. To evaluate Prov-GigaPath, we construct a digital pathology benchmark comprising 9 cancer subtyping tasks and 17 pathomics tasks, using both Providence and TCGA data6. With large-scale pretraining and ultra-large-context modelling, Prov-GigaPath attains state-of-the-art performance on 25 out of 26 tasks, with significant improvement over the second-best method on 18 tasks. We further demonstrate the potential of Prov-GigaPath on vision-language pretraining for pathology7,8 by incorporating the pathology reports. In sum, Prov-GigaPath is an open-weight foundation model that achieves state-of-the-art performance on various digital pathology tasks, demonstrating the importance of real-world data and whole-slide modelling.

Subject(s)

Neoplasms , Humans , Neoplasms/pathology , Benchmarking , Pathology, Clinical , Image Processing, Computer-Assisted

18.

A breakthrough in macro-scale circularity and eco-efficiency assessment: A case study of OECD countries.

Shabanpour, Hadi; Dargusch, Paul; Wadley, David; Saen, Reza Farzipoor; Lieske, Scott N.

J Environ Manage ; 360: 121070, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38744210

ABSTRACT

Countries' circularity performance and CO2 emissions should be addressed as a part of the UN net-zero Sustainable Development Goals (SDGs) 2030. Macro-scale circularity assessment is regarded as a helpful tool for tracking and adjusting nations' progress toward the sustainable Circular Economy (CE) and SDGs. However, practical frameworks are required to address the shortage of real-world circularity assessments at the macro level. The establishment of CE benchmarks is also essential to enhance circularity in less sustainable nations. Further, monitoring the extent to which nations' circularity activities are sustainable and in line with the SDGs is an area that lacks sufficient practical research. The current research aims to develop a macro-level framework and benchmarks for national sustainable circularity assessments. Methodologically, we develop a dynamic network data envelopment analysis (DN-DEA) framework for multi-period circularity and eco-efficiency assessment of OECD countries. To do so, we incorporate dual-role and bidirectional carryovers in our macro-scale framework. From a managerial perspective, we conduct a novel comparative analysis of the circularity and eco-efficiency of the nations to monitor macro-scale sustainable CE trends. Research results reveal a significant performance disparity in circularity, eco-efficiency, and benchmarking patterns. Accordingly, circularly efficient nations cannot necessarily be considered eco-friendly and sustainable. Although Germany (as a superior circular nation) can be regarded as a circularity benchmark, it cannot serve as an eco-efficiency benchmark for less eco-efficient nations. Hence, the new method allows decision-makers not only to identify the nations' circularity outcome but also to distinguish sustainable nations from less sustainable ones. This, on the one hand, provides policymakers with a multi-faceted sustainability analysis, beyond the previous unidimensional analysis. On the other, it proposes improvement benchmarks for planning and regulating nations' future circularity in line with real sustainability goals. The capabilities of our innovative approach are demonstrated in the case study.

Subject(s)

Organisation for Economic Co-Operation and Development , Sustainable Development , Conservation of Natural Resources/methods , Benchmarking , Carbon Dioxide/analysis

19.

Exploring memory synchronization and performance considerations for FPGA platform using the high-abstracted OpenCL framework: Benchmarks development and analysis.

Almomany, Abedalmuhdi; Jarrah, Amin; Sutcu, Muhammed.

PLoS One ; 19(5): e0301720, 2024.

Article in English | MEDLINE | ID: mdl-38739583

ABSTRACT

A key benefit of the Open Computing Language (OpenCL) software framework is its capability to operate across diverse architectures. Field programmable gate arrays (FPGAs) are a high-speed computing architecture used for computation acceleration. This study investigates the impact of memory access time on overall performance in general FPGA computing environments through the creation of eight benchmarks within the OpenCL framework. The developed benchmarks capture a range of memory access behaviors, and they play a crucial role in assessing the performance of spinning and sleeping on FPGA-based architectures. The results obtained guide the formulation of new implementations and contribute to defining an abstraction of FPGAs. This abstraction is then utilized to create tailored implementations of primitives that are well-suited for this platform. While other research endeavors concentrate on creating benchmarks with the Compute Unified Device Architecture (CUDA) to scrutinize the memory systems across diverse GPU architectures and propose recommendations for future generations of GPU computation platforms, this study delves into the memory system analysis for the broader FPGA computing platform. It achieves this by employing the highly abstracted OpenCL framework, exploring various data workload characteristics, and experimentally delineating the appropriate implementation of primitives that can seamlessly integrate into a design tailored for the FPGA computing platform. Additionally, the results underscore the efficacy of employing a task-parallel model to mitigate the need for high-cost synchronization mechanisms in designs constructed on general FPGA computing platforms.

Subject(s)

Benchmarking , Software , Humans , Programming Languages

20.

Benchmarking mapping algorithms for cell-type annotating in mouse brain by integrating single-nucleus RNA-seq and Stereo-seq data.

Tao, Quyuan; Xu, Yiheng; He, Youzhe; Luo, Ting; Li, Xiaoming; Han, Lei.

Brief Bioinform ; 25(4)2024 May 23.

Article in English | MEDLINE | ID: mdl-38796691

ABSTRACT

Limited gene capture efficiency and spot size of spatial transcriptome (ST) data pose significant challenges in cell-type characterization. The heterogeneity and complexity of cell composition in the mammalian brain make it more challenging to accurately annotate ST data from brain. Many algorithms attempt to characterize subtypes of neuron by integrating ST data with single-nucleus RNA sequencing (snRNA-seq) or single-cell RNA sequencing. However, assessing the accuracy of these algorithms on Stereo-seq ST data remains unresolved. Here, we benchmarked 9 mapping algorithms using 10 ST datasets from four mouse brain regions in two different resolutions and 24 pseudo-ST datasets from snRNA-seq. Both actual ST data and pseudo-ST data were mapped using snRNA-seq datasets from the corresponding brain regions as reference data. After comparing the performance across different areas and resolutions of the mouse brain, we have reached the conclusion that both robust cell-type decomposition and SpatialDWLS demonstrated superior robustness and accuracy in cell-type annotation. Testing with publicly available snRNA-seq data from another sequencing platform in the cortex region further validated our conclusions. Altogether, we developed a workflow for assessing suitability of mapping algorithm that fits for ST datasets, which can improve the efficiency and accuracy of spatial data annotation.

Subject(s)

Algorithms , Benchmarking , Brain , Single-Cell Analysis , Animals , Mice , Brain/metabolism , Single-Cell Analysis/methods , RNA-Seq/methods , Transcriptome , Sequence Analysis, RNA/methods , Neurons/metabolism , Gene Expression Profiling/methods

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL