Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.565
Filtrar
1.
PLoS One ; 19(5): e0301696, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38781237

RESUMEN

In the domain of question subjectivity classification, there exists a need for detailed datasets that can foster advancements in Automatic Subjective Question Answering (ASQA) systems. Addressing the prevailing research gaps, this paper introduces the Fine-Grained Question Subjectivity Dataset (FQSD), which comprises 10,000 questions. The dataset distinguishes between subjective and objective questions and offers additional categorizations such as Subjective-types (Target, Attitude, Reason, Yes/No, None) and Comparison-form (Single, Comparative). Annotation reliability was confirmed via robust evaluation techniques, yielding a Fleiss's Kappa score of 0.76 and Pearson correlation values up to 0.80 among three annotators. We benchmarked FQSD against existing datasets such as (Yu, Zha, and Chua 2012), SubjQA (Bjerva 2020), and ConvEx-DS (Hernandez-Bocanegra 2021). Our dataset excelled in scale, linguistic diversity, and syntactic complexity, establishing a new standard for future research. We employed visual methodologies to provide a nuanced understanding of the dataset and its classes. Utilizing transformer-based models like BERT, XLNET, and RoBERTa for validation, RoBERTa achieved an outstanding F1-score of 97%, confirming the dataset's efficacy for the advanced subjectivity classification task. Furthermore, we utilized Local Interpretable Model-agnostic Explanations (LIME) to elucidate model decision-making, ensuring transparent and reliable model predictions in subjectivity classification tasks.


Asunto(s)
Benchmarking , Humanos , Benchmarking/métodos , Reproducibilidad de los Resultados
3.
PLoS Comput Biol ; 20(4): e1011990, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38598551

RESUMEN

Prostate cancer is a heritable disease with ancestry-biased incidence and mortality. Polygenic risk scores (PRSs) offer promising advancements in predicting disease risk, including prostate cancer. While their accuracy continues to improve, research aimed at enhancing their effectiveness within African and Asian populations remains key for equitable use. Recent algorithmic developments for PRS derivation have resulted in improved pan-ancestral risk prediction for several diseases. In this study, we benchmark the predictive power of six widely used PRS derivation algorithms, including four of which adjust for ancestry, against prostate cancer cases and controls from the UK Biobank and All of Us cohorts. We find modest improvement in discriminatory ability when compared with a simple method that prioritizes variants, clumping, and published polygenic risk scores. Our findings underscore the importance of improving upon risk prediction algorithms and the sampling of diverse cohorts.


Asunto(s)
Algoritmos , Benchmarking , Predisposición Genética a la Enfermedad , Herencia Multifactorial , Neoplasias de la Próstata , Humanos , Neoplasias de la Próstata/genética , Masculino , Benchmarking/métodos , Predisposición Genética a la Enfermedad/genética , Herencia Multifactorial/genética , Estudios de Cohortes , Factores de Riesgo , Polimorfismo de Nucleótido Simple/genética , Estudio de Asociación del Genoma Completo/métodos , Biología Computacional/métodos , Medición de Riesgo/métodos , Estudios de Casos y Controles , Puntuación de Riesgo Genético
4.
Mol Ecol Resour ; 24(5): e13960, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38676702

RESUMEN

There is growing interest in uncovering genetic kinship patterns in past societies using low-coverage palaeogenomes. Here, we benchmark four tools for kinship estimation with such data: lcMLkin, NgsRelate, KIN, and READ, which differ in their input, IBD estimation methods, and statistical approaches. We used pedigree and ancient genome sequence simulations to evaluate these tools when only a limited number (1 to 50 K, with minor allele frequency ≥0.01) of shared SNPs are available. The performance of all four tools was comparable using ≥20 K SNPs. We found that first-degree related pairs can be accurately classified even with 1 K SNPs, with 85% F1 scores using READ and 96% using NgsRelate or lcMLkin. Distinguishing third-degree relatives from unrelated pairs or second-degree relatives was also possible with high accuracy (F1 > 90%) with 5 K SNPs using NgsRelate and lcMLkin, while READ and KIN showed lower success (69 and 79% respectively). Meanwhile, noise in population allele frequencies and inbreeding (first-cousin mating) led to deviations in kinship coefficients, with different sensitivities across tools. We conclude that using multiple tools in parallel might be an effective approach to achieve robust estimates on ultra-low-coverage genomes.


Asunto(s)
Benchmarking , Linaje , Polimorfismo de Nucleótido Simple , Benchmarking/métodos , Humanos , Frecuencia de los Genes , ADN Antiguo/análisis , Simulación por Computador , Genética de Población/métodos , Biología Computacional/métodos
5.
IEEE J Biomed Health Inform ; 28(6): 3523-3533, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38557613

RESUMEN

Germectomy is a common surgery in pediatric dentistry to prevent the potential dangers caused by impacted mandibular wisdom teeth. Segmentation of mandibular wisdom teeth is a crucial step in surgery planning. However, manually segmenting teeth and bones from 3D volumes is time-consuming and may cause delays in treatment. Deep learning based medical image segmentation methods have demonstrated the potential to reduce the burden of manual annotations, but they still require a lot of well-annotated data for training. In this paper, we initially curated a Cone Beam Computed Tomography (CBCT) dataset, NKUT, for the segmentation of pediatric mandibular wisdom teeth. This marks the first publicly available dataset in this domain. Second, we propose a semantic separation scale-specific feature fusion network named WTNet, which introduces two branches to address the teeth and bones segmentation tasks. In WTNet, We design a Input Enhancement (IE) block and a Teeth-Bones Feature Separation (TBFS) block to solve the feature confusions and semantic-blur problems in our task. Experimental results suggest that WTNet performs better on NKUT compared to previous state-of-the-art segmentation methods (such as TransUnet), with a maximum DSC lead of nearly 16%.


Asunto(s)
Tomografía Computarizada de Haz Cónico , Bases de Datos Factuales , Aprendizaje Profundo , Tercer Molar , Humanos , Niño , Tomografía Computarizada de Haz Cónico/métodos , Tercer Molar/diagnóstico por imagen , Mandíbula/diagnóstico por imagen , Benchmarking/métodos , Imagenología Tridimensional/métodos , Algoritmos
6.
Clin Exp Optom ; 107(2): 196-203, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-37952255

RESUMEN

CLINICAL RELEVANCE: Realistic benchmarks can serve as comparators for optometrists wishing to engage in clinical practice audits of their glaucoma care. BACKGROUND: The iCareTrack study established the appropriateness of glaucoma care delivery through clinical record audits of Australian optometry practices. Benchmarks required for monitoring and improving glaucoma care delivery do not exist. This study developed realistic benchmarks for glaucoma care and then benchmarked the performance of practices from the iCareTrack study to establish aspects of care that warrant attention from quality improvement initiatives. METHODS: Benchmarks were developed from the pre-existing iCareTrack dataset using the Achievable Benchmarks of Care (ABC) method. The iCareTrack study had audited the appropriateness of glaucoma care delivery against 37 clinical indicators for 420 randomly sampled glaucoma patient records from 42 Australian optometry practices. The four-step ABC method calculates benchmarks based on the top 10% of best-performing practices adjusted for low patient encounter numbers. iCareTrack results were compared to the benchmarks to explore the distribution of practices that were at, above or below benchmark. RESULTS: Benchmarks were developed for 34 of 37 iCareTrack indicators. For 26 (of 34) indicators, the benchmarks were at or above 90% appropriateness. The benchmarks for 14 (of 34) iCareTrack indicators were met by more than 80% of eligible practices, indicating excellent performance. Some aspects of glaucoma care such as peripheral anterior angle assessment, applanation tonometry, and visual field assessment appeared to be delivered sub-optimally by optometrists when compared to the benchmarks. CONCLUSION: This study established benchmarks for glaucoma care delivery in optometry practices that reflect realistic and top achievable performance. The large number of indicators with benchmarks above 90% confirmed that glaucoma care can and should be delivered by optometrists at very high levels of appropriateness. Benchmarking identified pockets of sub-optimal performance that can now be targeted by quality improvement initiatives.


Asunto(s)
Glaucoma , Optometría , Humanos , Benchmarking/métodos , Australia , Glaucoma/terapia , Atención a la Salud , Optometría/métodos
7.
Eur J Public Health ; 34(1): 44-51, 2024 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-37875008

RESUMEN

BACKGROUND: Value-based healthcare (VBHC) is a conceptual framework to improve the value of healthcare by health, care-process and economic outcomes. Benchmarking should provide useful information to identify best practices and therefore a good instrument to improve quality across healthcare organizations. This paper aims to provide a proof-of-concept of the feasibility of an international VBHC benchmarking in breast cancer, with the ultimate aim of being used to share best practices with a data-driven approach among healthcare organizations from different health systems. METHODS: In the VOICE community-a European healthcare centre cluster intending to address VBHC from theory to practice-information on patient-reported, clinical-related, care-process-related and economic-related outcomes were collected. Patient archetypes were identified using clustering techniques and an indicator set following a modified Delphi was defined. Benchmarking was performed using regression models controlling for patient archetypes and socio-demographic characteristics. RESULTS: Six hundred and ninety patients from six healthcare centres were included. A set of 50 health, care-process and economic indicators was distilled for benchmarking. Statistically significant differences across sites have been found in most health outcomes, half of the care-process indicators, and all economic indicators, allowing for identifying the best and worst performers. CONCLUSIONS: To the best of our knowledge, this is the first international experience providing evidence to be used with VBHC benchmarking intention. Differences in indicators across healthcare centres should be used to identify best practices and improve healthcare quality following further research. Applied methods might help to move forward with VBHC benchmarking in other medical conditions.


Asunto(s)
Benchmarking , Calidad de la Atención de Salud , Humanos , Benchmarking/métodos , Atención a la Salud
8.
Nat Methods ; 20(11): 1810-1821, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37783886

RESUMEN

The lack of benchmark data sets with inbuilt ground-truth makes it challenging to compare the performance of existing long-read isoform detection and differential expression analysis workflows. Here, we present a benchmark experiment using two human lung adenocarcinoma cell lines that were each profiled in triplicate together with synthetic, spliced, spike-in RNAs (sequins). Samples were deeply sequenced on both Illumina short-read and Oxford Nanopore Technologies long-read platforms. Alongside the ground-truth available via the sequins, we created in silico mixture samples to allow performance assessment in the absence of true positives or true negatives. Our results show that StringTie2 and bambu outperformed other tools from the six isoform detection tools tested, DESeq2, edgeR and limma-voom were best among the five differential transcript expression tools tested and there was no clear front-runner for performing differential transcript usage analysis between the five tools compared, which suggests further methods development is needed for this application.


Asunto(s)
Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Benchmarking/métodos , ARN , Isoformas de Proteínas
9.
Stud Health Technol Inform ; 302: 167-171, 2023 May 18.
Artículo en Inglés | MEDLINE | ID: mdl-37203640

RESUMEN

Feedback of data quality measures to study sites is an established procedure in the management of registries. Comparisons of data quality between registries as a whole are missing. We implemented a cross-registry benchmarking of data quality within the field of health services research for six projects. Five (2020) and six (2021) quality indicators were selected from a national recommendation. The calculation of the indicators was adjusted to the registries' specific settings. Nineteen (2020) and 29 results (2021) could be included in the yearly quality report. Seventy-four per cent (2020) and 79% (2021) of the results did not include the threshold in their 95%-confidence-limits. The benchmarking revealed several starting points for a weak-point analysis through a comparison of results with a predefined threshold as well as through comparisons among each other. In the future, a cross-registry benchmarking might be part of services provided through a health services research infrastructure.


Asunto(s)
Benchmarking , Indicadores de Calidad de la Atención de Salud , Benchmarking/métodos , Sistema de Registros , Recolección de Datos , Exactitud de los Datos
10.
Environ Health ; 22(1): 40, 2023 05 06.
Artículo en Inglés | MEDLINE | ID: mdl-37147704

RESUMEN

BACKGROUND: Exposure to perfluorinated alkylate substances (PFAS) is associated with harmful effects on human health, including developmental immunotoxicity. This outcome was chosen as the critical effect by the European Food Safety Authority (EFSA), which calculated a new joint reference dose for four PFAS using a Benchmark Dose (BMD) analysis of a study of 1-year old children. However, the U.S. Environmental Protection Agency (EPA) recently proposed much lower exposure limits. METHODS: We explored the BMD methodology for summary and individual data and compared the results with and without grouping for two data sets available. We compared the performance of different dose-response models including a hockey-stick model and a piecewise linear model. We considered different ways of testing the assumption of equal weight-based toxicity of the four PFAS and evaluated more flexible models with exposure indices allowing for differences in toxicity. RESULTS: Results relying on full and decile-based data were in good accordance. However, BMD results for the larger study were lower than observed by EFSA for the smaller study. EFSA derived a lower confidence limit for the BMD of 17.5 ng/mL for the sum of serum-PFAS concentration, while similar calculations in the larger cohort yielded values of about 1.5 ng/mL. As the assumption of equal weight-based toxicity of the four PFAS seems questionable, we confirmed dose-dependencies that allowed potency differences between PFAS. We also found that models linear in the parameters for the BMD analysis showed superior coverage probabilities. In particular, we found the piecewise linear model to be useful for Benchmark analysis. CONCLUSIONS: Both data sets considered could be analyzed on a decile basis without important bias or loss of power. The larger study showed substantially lower BMD results, both for individual PFAS and for joint exposures. Overall, EFSA's proposed tolerable exposure limit appears too high, while the EPA proposal is in better accordance with the results.


Asunto(s)
Benchmarking , Fluorocarburos , Niño , Humanos , Lactante , Benchmarking/métodos , Fluorocarburos/toxicidad
11.
Nat Methods ; 20(3): 375-386, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36864200

RESUMEN

Analyzing proteins from single cells by tandem mass spectrometry (MS) has recently become technically feasible. While such analysis has the potential to accurately quantify thousands of proteins across thousands of single cells, the accuracy and reproducibility of the results may be undermined by numerous factors affecting experimental design, sample preparation, data acquisition and data analysis. We expect that broadly accepted community guidelines and standardized metrics will enhance rigor, data quality and alignment between laboratories. Here we propose best practices, quality controls and data-reporting recommendations to assist in the broad adoption of reliable quantitative workflows for single-cell proteomics. Resources and discussion forums are available at https://single-cell.net/guidelines .


Asunto(s)
Benchmarking , Proteómica , Benchmarking/métodos , Proteómica/métodos , Reproducibilidad de los Resultados , Proteínas/análisis , Espectrometría de Masas en Tándem/métodos , Proteoma/análisis
12.
Int J Radiat Biol ; 99(9): 1320-1331, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36881459

RESUMEN

BACKGROUND: Exposure to different forms of ionizing radiation occurs in diverse occupational, medical, and environmental settings. Improving the accuracy of the estimated health risks associated with exposure is therefore, essential for protecting the public, particularly as it relates to chronic low dose exposures. A key aspect to understanding health risks is precise and accurate modeling of the dose-response relationship. Toward this vision, benchmark dose (BMD) modeling may be a suitable approach for consideration in the radiation field. BMD modeling is already extensively used for chemical hazard assessments and is considered statistically preferable to identifying low and no observed adverse effects levels. BMD modeling involves fitting mathematical models to dose-response data for a relevant biological endpoint and identifying a point of departure (the BMD, or its lower bound). Recent examples in chemical toxicology show that when applied to molecular endpoints (e.g. genotoxic and transcriptional endpoints), BMDs correlate to points of departure for more apical endpoints such as phenotypic changes (e.g. adverse effects) of interest to regulatory decisions. This use of BMD modeling may be valuable to explore in the radiation field, specifically in combination with adverse outcome pathways, and may facilitate better interpretation of relevant in vivo and in vitro dose-response data. To advance this application, a workshop was organized on June 3rd, 2022, in Ottawa, Ontario that brought together BMD experts in chemical toxicology and the radiation scientific community of researchers, regulators, and policy-makers. The workshop's objective was to introduce radiation scientists to BMD modeling and its practical application using case examples from the chemical toxicity field and demonstrate the BMDExpress software using a radiation dataset. Discussions focused on the BMD approach, the importance of experimental design, regulatory applications, its use in supporting the development of adverse outcome pathways, and specific radiation-relevant examples. CONCLUSIONS: Although further deliberations are needed to advance the use of BMD modeling in the radiation field, these initial discussions and partnerships highlight some key steps to guide future undertakings related to new experimental work.


Asunto(s)
Benchmarking , Modelos Teóricos , Benchmarking/métodos , Daño del ADN , Medición de Riesgo/métodos , Relación Dosis-Respuesta a Droga
13.
Nat Commun ; 14(1): 1570, 2023 03 21.
Artículo en Inglés | MEDLINE | ID: mdl-36944632

RESUMEN

Integration of single-cell RNA sequencing data between different samples has been a major challenge for analyzing cell populations. However, strategies to integrate differential expression analysis of single-cell data remain underinvestigated. Here, we benchmark 46 workflows for differential expression analysis of single-cell data with multiple batches. We show that batch effects, sequencing depth and data sparsity substantially impact their performances. Notably, we find that the use of batch-corrected data rarely improves the analysis for sparse data, whereas batch covariate modeling improves the analysis for substantial batch effects. We show that for low depth data, single-cell techniques based on zero-inflation model deteriorate the performance, whereas the analysis of uncorrected data using limmatrend, Wilcoxon test and fixed effects model performs well. We suggest several high-performance methods under different conditions based on various simulation and real data analyses. Additionally, we demonstrate that differential expression analysis for a specific cell type outperforms that of large-scale bulk sample data in prioritizing disease-related genes.


Asunto(s)
Benchmarking , Análisis de Datos , Análisis de Secuencia de ARN/métodos , Benchmarking/métodos , Simulación por Computador , Flujo de Trabajo , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos
14.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36592056

RESUMEN

Circular RNAs (circRNAs) are covalently closed transcripts involved in critical regulatory axes, cancer pathways and disease mechanisms. CircRNA expression measured with RNA-seq has particular characteristics that might hamper the performance of standard biostatistical differential expression assessment methods (DEMs). We compared 38 DEM pipelines configured to fit circRNA expression data's statistical properties, including bulk RNA-seq, single-cell RNA-seq (scRNA-seq) and metagenomics DEMs. The DEMs performed poorly on data sets of typical size. Widely used DEMs, such as DESeq2, edgeR and Limma-Voom, gave scarce results, unreliable predictions or even contravened the expected behaviour with some parameter configurations. Limma-Voom achieved the most consistent performance throughout different benchmark data sets and, as well as SAMseq, reasonably balanced false discovery rate (FDR) and recall rate. Interestingly, a few scRNA-seq DEMs obtained results comparable with the best-performing bulk RNA-seq tools. Almost all DEMs' performance improved when increasing the number of replicates. CircRNA expression studies require careful design, choice of DEM and DEM configuration. This analysis can guide scientists in selecting the appropriate tools to investigate circRNA differential expression with RNA-seq experiments.


Asunto(s)
Benchmarking , ARN Circular , Benchmarking/métodos , Análisis de Secuencia de ARN/métodos , RNA-Seq , Metagenómica , ARN/genética
15.
Nat Commun ; 14(1): 94, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36609502

RESUMEN

A plethora of software suites and multiple classes of spectral libraries have been developed to enhance the depth and robustness of data-independent acquisition (DIA) data processing. However, how the combination of a DIA software tool and a spectral library impacts the outcome of DIA proteomics and phosphoproteomics data analysis has been rarely investigated using benchmark data that mimics biological complexity. In this study, we create DIA benchmark data sets simulating the regulation of thousands of proteins in a complex background, which are collected on both an Orbitrap and a timsTOF instruments. We evaluate four commonly used software suites (DIA-NN, Spectronaut, MaxDIA and Skyline) combined with seven different spectral libraries in global proteome analysis. Moreover, we assess their performances in analyzing phosphopeptide standards and TNF-α-induced phosphoproteome regulation. Our study provides a practical guidance on how to construct a robust data analysis pipeline for different proteomics studies implementing the DIA technique.


Asunto(s)
Benchmarking , Proteómica , Proteómica/métodos , Benchmarking/métodos , Flujo de Trabajo , Espectrometría de Masas/métodos , Programas Informáticos , Proteoma/metabolismo
16.
Clin Exp Optom ; 106(3): 276-282, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-35125062

RESUMEN

CLINICAL RELEVANCE: Current levels of appropriateness for primary diabetic eyecare delivered by Australian optometrists are presented along with realistic targets (benchmarks) for quality improvement. The demonstrated methods can be used in practice evaluation and benchmarking of other clinical practice areas and settings. BACKGROUND: To examine the appropriateness of diabetic eye-care delivery and establish achievable benchmarks of care (ABCs) for optometry practices in Australia. METHOD: In a retrospective audit, clinical records of patients with type-II diabetes obtained from a randomly selected nationally representative sample of optometry practices were assessed against evidence-based clinical indicators. Appropriate care is defined as care delivered in compliance with the indicators. The ABC for each indicator was calculated as the average performance for the top 10% of optometry practices after Bayesian adjustment to account for a low number of eligible records. RESULTS: The audit of 420 randomly selected patient records from 42 practices against 12 clinical indicators showed an overall appropriateness of 69% (95% confidence interval (CI) 66%, 73%) for overall diabetic eye care. While a high level of appropriateness was identified for recall period (93%, 95% CI 85%, 100%) and referral (100%, 95% CI 38%, 100%), larger gaps existed in history taking (46%, 95% CI 44%, 52%), dilated fundus examination (80%, 95% CI 76%, 84%) and iris examination (0%, 95% CI 0%, 56%). The ABCs for 8 of 12 indicators were 100%, and the remaining three indicators had ABCs above 80%. An ABC for the iris examination indicator could not be calculated owing to the low number of eligible patient record cards. CONCLUSIONS: This study demonstrated a systematic process of practice evaluation and benchmarking in optometry practices. The diabetic eye care delivered by Australian optometrists was largely appropriate; however, improvement opportunities exist for history taking and physical examination. The ABCs demonstrate that excellence in primary diabetic eye care is attainable and will serve as an important tool in future initiatives to reduce the identified evidence-to-practice gaps.


Asunto(s)
Diabetes Mellitus , Optometría , Humanos , Estudios Retrospectivos , Teorema de Bayes , Australia/epidemiología , Benchmarking/métodos , Diabetes Mellitus/epidemiología , Diabetes Mellitus/terapia
17.
J Behav Health Serv Res ; 50(1): 128-146, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-35835954

RESUMEN

Performance management of mental health services (MHS) through quality reporting of strategic indicators and goals is essential to improve efficiency and quality of care. One such method is the balanced scorecard (BSC). This integrative review of peer-reviewed and industry implemented BSCs in MHS aims to inform future development of a more comprehensive mental health-focused benchmarking tool. A two-part systematic literature search consisted of peer-reviewed published literature on MHS specific BSCs utilising the PRISMA guidelines in addition to industry published BSCs available online. A total of 17 unique BSCs were identified. A total of 434 indicators were subject to thematic analysis identifying 11 key themes: prevalence, accessibility, services provided, clinical outcomes, client satisfaction, client involvement, staff motivation, staffing levels, governance and compliance, development, and costs and revenue. These themes represented the measures that MHS believed measured key performance criteria in alignment with their organisational objectives.


Asunto(s)
Benchmarking , Servicios de Salud Mental , Humanos , Benchmarking/métodos
18.
Nat Commun ; 13(1): 6793, 2022 11 10.
Artículo en Inglés | MEDLINE | ID: mdl-36357391

RESUMEN

Benchmarks are crucial to measuring and steering progress in artificial intelligence (AI). However, recent studies raised concerns over the state of AI benchmarking, reporting issues such as benchmark overfitting, benchmark saturation and increasing centralization of benchmark dataset creation. To facilitate monitoring of the health of the AI benchmarking ecosystem, we introduce methodologies for creating condensed maps of the global dynamics of benchmark creation and saturation. We curate data for 3765 benchmarks covering the entire domains of computer vision and natural language processing, and show that a large fraction of benchmarks quickly trends towards near-saturation, that many benchmarks fail to find widespread utilization, and that benchmark performance gains for different AI tasks are prone to unforeseen bursts. We analyze attributes associated with benchmark popularity, and conclude that future benchmarks should emphasize versatility, breadth and real-world utility.


Asunto(s)
Inteligencia Artificial , Benchmarking , Benchmarking/métodos , Ecosistema , Fenómenos Físicos
19.
Microb Genom ; 8(10)2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36269282

RESUMEN

Culture-independent metagenomic detection of microbial species has the potential to provide rapid and precise real-time diagnostic results. However, it is potentially limited by sequencing and taxonomic classification errors. We use simulated and real-world data to benchmark rates of species misclassification using 100 reference genomes for each of the ten common bloodstream pathogens and six frequent blood-culture contaminants (n=1568, only 68 genomes were available for Micrococcus luteus). Simulating both with and without sequencing error for both the Illumina and Oxford Nanopore platforms, we evaluated commonly used classification tools including Kraken2, Bracken and Centrifuge, utilizing mini (8 GB) and standard (30-50 GB) databases. Bracken with the standard database performed best, the median percentage of reads across both sequencing platforms identified correctly to the species level was 97.8% (IQR 92.7:99.0) [range 5:100]. For Kraken2 with a mini database, a commonly used combination, median species-level identification was 86.4% (IQR 50.5:93.7) [range 4.3:100]. Classification performance varied by species, with Escherichia coli being more challenging to classify correctly (probability of reads being assigned to the correct species: 56.1-96.0%, varying by tool used). Human read misclassification was negligible. By filtering out shorter Nanopore reads we found performance similar or superior to Illumina sequencing, despite higher sequencing error rates. Misclassification was more common when the misclassified species had a higher average nucleotide identity to the true species. Our findings highlight taxonomic misclassification of sequencing data occurs and varies by sequencing and analysis workflow. To account for 'bioinformatic contamination' we present a contamination catalogue that can be used in metagenomic pipelines to ensure accurate results that can support clinical decision making.


Asunto(s)
Nanoporos , Humanos , Benchmarking/métodos , Metagenómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Nucleótidos
20.
Adv Surg ; 56(1): 89-109, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-36096580

RESUMEN

Efforts to improve quality in healthcare have arisen from the recognition that the quality of care delivered and resulting outcomes are highly variable. Performance benchmarking using high-quality data to compare risk-adjusted outcomes between hospitals and surgeons has been widely adopted as one means for addressing this problem. In this article we discuss the history, current state, methodologies, and potential pitfalls of benchmarking efforts to improve quality of healthcare in the United States.


Asunto(s)
Benchmarking , Cirujanos , Benchmarking/métodos , Humanos , Estados Unidos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA