Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 19.302
Filtrar
1.
J Biomed Opt ; 30(Suppl 1): S13703, 2025 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-39034959

RESUMO

Significance: Standardization of fluorescence molecular imaging (FMI) is critical for ensuring quality control in guiding surgical procedures. To accurately evaluate system performance, two metrics, the signal-to-noise ratio (SNR) and contrast, are widely employed. However, there is currently no consensus on how these metrics can be computed. Aim: We aim to examine the impact of SNR and contrast definitions on the performance assessment of FMI systems. Approach: We quantified the SNR and contrast of six near-infrared FMI systems by imaging a multi-parametric phantom. Based on approaches commonly used in the literature, we quantified seven SNRs and four contrast values considering different background regions and/or formulas. Then, we calculated benchmarking (BM) scores and respective rank values for each system. Results: We show that the performance assessment of an FMI system changes depending on the background locations and the applied quantification method. For a single system, the different metrics can vary up to ∼ 35 dB (SNR), ∼ 8.65 a . u . (contrast), and ∼ 0.67 a . u . (BM score). Conclusions: The definition of precise guidelines for FMI performance assessment is imperative to ensure successful clinical translation of the technology. Such guidelines can also enable quality control for the already clinically approved indocyanine green-based fluorescence image-guided surgery.


Assuntos
Benchmarking , Imagem Molecular , Imagem Óptica , Imagens de Fantasmas , Razão Sinal-Ruído , Imagem Molecular/métodos , Imagem Molecular/normas , Imagem Óptica/métodos , Imagem Óptica/normas , Processamento de Imagem Assistida por Computador/métodos
2.
Front Public Health ; 12: 1363957, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38952740

RESUMO

Background and aims: Laboratory performance as a relative concept needs repetitive benchmarking for continuous improvement of laboratory procedures and medical processes. Benchmarking as such establishes reference levels as a basis for improvements efforts for healthcare institutions along the diagnosis cycle, with the patient at its center. But while this concept seems to be generally acknowledged in laboratory medicine, a lack of practical implementation hinders progress at a global level. The aim of this study was to examine the utility of a specific combination of indicators and survey-based data collection approach, and to establish a global benchmarking dataset of laboratory performance for decision makers in healthcare institutions. Methods: The survey consisted of 44 items relating to laboratory operations in general and three subscales identified in previous studies. A global sample of laboratories was approached by trained professionals. Results were analyzed with standard descriptive statistics and exploratory factor analysis. Dimensional reduction of specific items was performed using confirmatory factor analysis, resulting in individual laboratory scores for the three subscales of "Operational performance," "Integrated clinical care performance," and "Financial sustainability" for the high-level concept of laboratory performance. Results and conclusions: In total, 920 laboratories from 55 countries across the globe participated in the survey, of which 401 were government hospital laboratories, 296 private hospital laboratories, and 223 commercial laboratories. Relevant results include the need for digitalization and automation along the diagnosis cycle. Formal quality management systems (ISO 9001, ISO 15189 etc.) need to be adapted more broadly to increase patient safety. Monitoring of key performance indicators (KPIs) relating to healthcare performance was generally low (in the range of 10-30% of laboratories overall), and as a particularly salient result, only 19% of laboratories monitored KPIs relating to speeding up diagnosis and treatment. Altogether, this benchmark elucidates current practice and has the potential to guide improvement efforts and standardization in quality & safety for patients and employees alike as well as sustainability of healthcare systems around the globe.


Assuntos
Benchmarking , Humanos , Inquéritos e Questionários , Laboratórios Clínicos/normas , Saúde Global
3.
Genome Biol ; 25(1): 169, 2024 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-38956606

RESUMO

BACKGROUND: Computational cell type deconvolution enables the estimation of cell type abundance from bulk tissues and is important for understanding tissue microenviroment, especially in tumor tissues. With rapid development of deconvolution methods, many benchmarking studies have been published aiming for a comprehensive evaluation for these methods. Benchmarking studies rely on cell-type resolved single-cell RNA-seq data to create simulated pseudobulk datasets by adding individual cells-types in controlled proportions. RESULTS: In our work, we show that the standard application of this approach, which uses randomly selected single cells, regardless of the intrinsic difference between them, generates synthetic bulk expression values that lack appropriate biological variance. We demonstrate why and how the current bulk simulation pipeline with random cells is unrealistic and propose a heterogeneous simulation strategy as a solution. The heterogeneously simulated bulk samples match up with the variance observed in real bulk datasets and therefore provide concrete benefits for benchmarking in several ways. We demonstrate that conceptual classes of deconvolution methods differ dramatically in their robustness to heterogeneity with reference-free methods performing particularly poorly. For regression-based methods, the heterogeneous simulation provides an explicit framework to disentangle the contributions of reference construction and regression methods to performance. Finally, we perform an extensive benchmark of diverse methods across eight different datasets and find BayesPrism and a hybrid MuSiC/CIBERSORTx approach to be the top performers. CONCLUSIONS: Our heterogeneous bulk simulation method and the entire benchmarking framework is implemented in a user friendly package https://github.com/humengying0907/deconvBenchmarking and https://doi.org/10.5281/zenodo.8206516 , enabling further developments in deconvolution methods.


Assuntos
Benchmarking , Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Simulação por Computador , RNA-Seq/métodos , Biologia Computacional/métodos
4.
PLoS One ; 19(7): e0305856, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38968250

RESUMO

Continual learning and few-shot learning are important frontiers in progress toward broader Machine Learning (ML) capabilities. Recently, there has been intense interest in combining both. One of the first examples to do so was the Continual few-shot Learning (CFSL) framework of Antoniou et al. (2020). In this study, we extend CFSL in two ways that capture a broader range of challenges, important for intelligent agent behaviour in real-world conditions. First, we increased the number of classes by an order of magnitude, making the results more comparable to standard continual learning experiments. Second, we introduced an 'instance test' which requires recognition of specific instances of classes-a capability of animal cognition that is usually neglected in ML. For an initial exploration of ML model performance under these conditions, we selected representative baseline models from the original CFSL work and added a model variant with replay. As expected, learning more classes is more difficult than the original CFSL experiments, and interestingly, the way in which image instances and classes are presented affects classification performance. Surprisingly, accuracy in the baseline instance test is comparable to other classification tasks, but poor given significant occlusion and noise. The use of replay for consolidation substantially improves performance for both types of tasks, but particularly for the instance test.


Assuntos
Benchmarking , Aprendizado de Máquina , Animais , Algoritmos
5.
BMC Public Health ; 24(1): 1790, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-38970046

RESUMO

BACKGROUND: Aboriginal and Torres Strait Islander communities in remote Australia have initiated bold policies for health-enabling stores. Benchmarking, a data-driven and facilitated 'audit and feedback' with action planning process, provides a potential strategy to strengthen and scale health-enabling best-practice adoption by remote community store directors/owners. We aim to co-design a benchmarking model with five partner organisations and test its effectiveness with Aboriginal and Torres Strait Islander community stores in remote Australia. METHODS: Study design is a pragmatic randomised controlled trial with consenting eligible stores (located in very remote Northern Territory (NT) of Australia, primary grocery store for an Aboriginal community, and serviced by a Nutrition Practitioner with a study partner organisation). The Benchmarking model is informed by research evidence, purpose-built best-practice audit and feedback tools, and co-designed with partner organisation and community representatives. The intervention comprises two full benchmarking cycles (one per year, 2022/23 and 2023/24) of assessment, feedback, action planning and action implementation. Assessment of stores includes i adoption status of 21 evidence-and industry-informed health-enabling policies for remote stores, ii implementation of health-enabling best-practice using a purpose-built Store Scout App, iii price of a standardised healthy diet using the Aboriginal and Torres Strait Islander Healthy Diets ASAP protocol; and, iv healthiness of food purchasing using sales data indicators. Partner organisations feedback reports and co-design action plans with stores. Control stores receive assessments and continue with usual retail practice. All stores provide weekly electronic sales data to assess the primary outcome, change in free sugars (g) to energy (MJ) from all food and drinks purchased, baseline (July-December 2021) vs July-December 2023. DISCUSSION: We hypothesise that the benchmarking intervention can improve the adoption of health-enabling store policy and practice and reduce sales of unhealthy foods and drinks in remote community stores of Australia. This innovative research with remote Aboriginal and Torres Strait Islander communities can inform effective implementation strategies for healthy food retail more broadly. TRIAL REGISTRATION: ACTRN12622000596707, Protocol version 1.


Assuntos
Benchmarking , Dieta Saudável , Abastecimento de Alimentos , Humanos , Austrália , Povos Aborígenes Australianos e Ilhéus do Estreito de Torres , Comércio , Abastecimento de Alimentos/normas , População Rural , Ensaios Clínicos Controlados Aleatórios como Assunto
6.
Nat Commun ; 15(1): 6167, 2024 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-39039053

RESUMO

Translating RNA-seq into clinical diagnostics requires ensuring the reliability and cross-laboratory consistency of detecting clinically relevant subtle differential expressions, such as those between different disease subtypes or stages. As part of the Quartet project, we present an RNA-seq benchmarking study across 45 laboratories using the Quartet and MAQC reference samples spiked with ERCC controls. Based on multiple types of 'ground truth', we systematically assess the real-world RNA-seq performance and investigate the influencing factors involved in 26 experimental processes and 140 bioinformatics pipelines. Here we show greater inter-laboratory variations in detecting subtle differential expressions among the Quartet samples. Experimental factors including mRNA enrichment and strandedness, and each bioinformatics step, emerge as primary sources of variations in gene expression. We underscore the profound influence of experimental execution, and provide best practice recommendations for experimental designs, strategies for filtering low-expression genes, and the optimal gene annotation and analysis pipelines. In summary, this study lays the foundation for developing and quality control of RNA-seq for clinical diagnostic purposes.


Assuntos
Benchmarking , Biologia Computacional , Controle de Qualidade , RNA-Seq , Padrões de Referência , Benchmarking/métodos , Humanos , RNA-Seq/métodos , RNA-Seq/normas , Biologia Computacional/métodos , Reprodutibilidade dos Testes , Análise de Sequência de RNA/métodos , Análise de Sequência de RNA/normas , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/normas , RNA Mensageiro/genética , RNA Mensageiro/metabolismo
7.
JMIR Ment Health ; 11: e57306, 2024 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-39042893

RESUMO

BACKGROUND: Comprehensive session summaries enable effective continuity in mental health counseling, facilitating informed therapy planning. However, manual summarization presents a significant challenge, diverting experts' attention from the core counseling process. Leveraging advances in automatic summarization to streamline the summarization process addresses this issue because this enables mental health professionals to access concise summaries of lengthy therapy sessions, thereby increasing their efficiency. However, existing approaches often overlook the nuanced intricacies inherent in counseling interactions. OBJECTIVE: This study evaluates the effectiveness of state-of-the-art large language models (LLMs) in selectively summarizing various components of therapy sessions through aspect-based summarization, aiming to benchmark their performance. METHODS: We first created Mental Health Counseling-Component-Guided Dialogue Summaries, a benchmarking data set that consists of 191 counseling sessions with summaries focused on 3 distinct counseling components (also known as counseling aspects). Next, we assessed the capabilities of 11 state-of-the-art LLMs in addressing the task of counseling-component-guided summarization. The generated summaries were evaluated quantitatively using standard summarization metrics and verified qualitatively by mental health professionals. RESULTS: Our findings demonstrated the superior performance of task-specific LLMs such as MentalLlama, Mistral, and MentalBART evaluated using standard quantitative metrics such as Recall-Oriented Understudy for Gisting Evaluation (ROUGE)-1, ROUGE-2, ROUGE-L, and Bidirectional Encoder Representations from Transformers Score across all aspects of the counseling components. Furthermore, expert evaluation revealed that Mistral superseded both MentalLlama and MentalBART across 6 parameters: affective attitude, burden, ethicality, coherence, opportunity costs, and perceived effectiveness. However, these models exhibit a common weakness in terms of room for improvement in the opportunity costs and perceived effectiveness metrics. CONCLUSIONS: While LLMs fine-tuned specifically on mental health domain data display better performance based on automatic evaluation scores, expert assessments indicate that these models are not yet reliable for clinical application. Further refinement and validation are necessary before their implementation in practice.


Assuntos
Benchmarking , Aconselhamento , Humanos , Aconselhamento/métodos , Adulto , Transtornos Mentais/terapia , Feminino
8.
Genome Biol ; 25(1): 192, 2024 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-39030569

RESUMO

BACKGROUND: CRISPR-Cas9 dropout screens are formidable tools for investigating biology with unprecedented precision and scale. However, biases in data lead to potential confounding effects on interpretation and compromise overall quality. The activity of Cas9 is influenced by structural features of the target site, including copy number amplifications (CN bias). More worryingly, proximal targeted loci tend to generate similar gene-independent responses to CRISPR-Cas9 targeting (proximity bias), possibly due to Cas9-induced whole chromosome-arm truncations or other genomic structural features and different chromatin accessibility levels. RESULTS: We benchmarked eight computational methods, rigorously evaluating their ability to reduce both CN and proximity bias in the two largest publicly available cell-line-based CRISPR-Cas9 screens to date. We also evaluated the capability of each method to preserve data quality and heterogeneity by assessing the extent to which the processed data allows accurate detection of true positive essential genes, established oncogenetic addictions, and known/novel biomarkers of cancer dependency. Our analysis sheds light on the ability of each method to correct biases under different scenarios. AC-Chronos outperforms other methods in correcting both CN and proximity biases when jointly processing multiple screens of models with available CN information, whereas CRISPRcleanR is the top performing method for individual screens or when CN information is not available. In addition, Chronos and AC-Chronos yield a final dataset better able to recapitulate known sets of essential and non-essential genes. CONCLUSIONS: Overall, our investigation provides guidance for the selection of the most appropriate bias-correction method, based on its strengths, weaknesses and experimental settings.


Assuntos
Benchmarking , Sistemas CRISPR-Cas , Humanos , Biologia Computacional/métodos , Viés
9.
BMC Genomics ; 25(1): 679, 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38978005

RESUMO

BACKGROUND: Oxford Nanopore provides high throughput sequencing platforms able to reconstruct complete bacterial genomes with 99.95% accuracy. However, even small levels of error can obscure the phylogenetic relationships between closely related isolates. Polishing tools have been developed to correct these errors, but it is uncertain if they obtain the accuracy needed for the high-resolution source tracking of foodborne illness outbreaks. RESULTS: We tested 132 combinations of assembly and short- and long-read polishing tools to assess their accuracy for reconstructing the genome sequences of 15 highly similar Salmonella enterica serovar Newport isolates from a 2020 onion outbreak. While long-read polishing alone improved accuracy, near perfect accuracy (99.9999% accuracy or ~ 5 nucleotide errors across the 4.8 Mbp genome, excluding low confidence regions) was only obtained by pipelines that combined both long- and short-read polishing tools. Notably, medaka was a more accurate and efficient long-read polisher than Racon. Among short-read polishers, NextPolish showed the highest accuracy, but Pilon, Polypolish, and POLCA performed similarly. Among the 5 best performing pipelines, polishing with medaka followed by NextPolish was the most common combination. Importantly, the order of polishing tools mattered i.e., using less accurate tools after more accurate ones introduced errors. Indels in homopolymers and repetitive regions, where the short reads could not be uniquely mapped, remained the most challenging errors to correct. CONCLUSIONS: Short reads are still needed to correct errors in nanopore sequenced assemblies to obtain the accuracy required for source tracking investigations. Our granular assessment of the performance of the polishing pipelines allowed us to suggest best practices for tool users and areas for improvement for tool developers.


Assuntos
Benchmarking , Surtos de Doenças , Genoma Bacteriano , Nanoporos , Sequenciamento por Nanoporos/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Salmonella enterica/genética , Salmonella enterica/isolamento & purificação , Humanos , Filogenia
10.
Rev Lat Am Enfermagem ; 32: e4221, 2024.
Artigo em Inglês, Espanhol, Português | MEDLINE | ID: mdl-38985044

RESUMO

OBJECTIVE: to map the content and features of mobile applications on the management of Diabetes Mellitus and their usability on the main operating systems. METHOD: benchmarking research. The mapping of apps, content, and resources on the Play Store and App Store platforms was based on an adaptation of the Joanna Briggs Institute's scoping review framework. For the usability analysis, the apps were tested for two weeks and the System Usability Scale instrument was used, with scores between 50-67 points being considered borderline, between 68-84, products with acceptable usability and above 85, excellent user acceptance and, for the analysis, descriptive statistics. RESULTS: the most prevalent contents were capillary blood glucose management, diet, oral drug therapy, and insulin therapy. As for resources, diaries and graphs were the most common. With regard to usability, two apps were considered to have excellent usability; 34, products with acceptable usability; 29, the resource may have some flaws but still has acceptable usability standards and 6, with flaws and no usability conditions. CONCLUSION: the content and resources of mobile applications address the fundamental points for managing Diabetes Mellitus with user-friendly resources, with usability acceptable to users and have the potential to assist in the management of Diabetes Mellitus in patients' daily lives.


Assuntos
Benchmarking , Diabetes Mellitus , Aplicativos Móveis , Humanos , Aplicativos Móveis/normas , Diabetes Mellitus/terapia
11.
Genome Biol ; 25(1): 172, 2024 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-38951922

RESUMO

BACKGROUND: Computational variant effect predictors offer a scalable and increasingly reliable means of interpreting human genetic variation, but concerns of circularity and bias have limited previous methods for evaluating and comparing predictors. Population-level cohorts of genotyped and phenotyped participants that have not been used in predictor training can facilitate an unbiased benchmarking of available methods. Using a curated set of human gene-trait associations with a reported rare-variant burden association, we evaluate the correlations of 24 computational variant effect predictors with associated human traits in the UK Biobank and All of Us cohorts. RESULTS: AlphaMissense outperformed all other predictors in inferring human traits based on rare missense variants in UK Biobank and All of Us participants. The overall rankings of computational variant effect predictors in these two cohorts showed a significant positive correlation. CONCLUSION: We describe a method to assess computational variant effect predictors that sidesteps the limitations of previous evaluations. This approach is generalizable to future predictors and could continue to inform predictor choice for personal and clinical genetics.


Assuntos
Benchmarking , Variação Genética , Humanos , Fenótipo , Biologia Computacional/métodos , Genótipo
12.
NPJ Syst Biol Appl ; 10(1): 73, 2024 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-38997321

RESUMO

Immunoglobulins (Ig), which exist either as B-cell receptors (BCR) on the surface of B cells or as antibodies when secreted, play a key role in the recognition and response to antigenic threats. The capability to jointly characterize the BCR and antibody repertoire is crucial for understanding human adaptive immunity. From peripheral blood, bulk BCR sequencing (bulkBCR-seq) currently provides the highest sampling depth, single-cell BCR sequencing (scBCR-seq) allows for paired chain characterization, and antibody peptide sequencing by tandem mass spectrometry (Ab-seq) provides information on the composition of secreted antibodies in the serum. Yet, it has not been benchmarked to what extent the datasets generated by these three technologies overlap and complement each other. To address this question, we isolated peripheral blood B cells from healthy human donors and sequenced BCRs at bulk and single-cell levels, in addition to utilizing publicly available sequencing data. Integrated analysis was performed on these datasets, resolved by replicates and across individuals. Simultaneously, serum antibodies were isolated, digested with multiple proteases, and analyzed with Ab-seq. Systems immunology analysis showed high concordance in repertoire features between bulk and scBCR-seq within individuals, especially when replicates were utilized. In addition, Ab-seq identified clonotype-specific peptides using both bulk and scBCR-seq library references, demonstrating the feasibility of combining scBCR-seq and Ab-seq for reconstructing paired-chain Ig sequences from the serum antibody repertoire. Collectively, our work serves as a proof-of-principle for combining bulk sequencing, single-cell sequencing, and mass spectrometry as complementary methods towards capturing humoral immunity in its entirety.


Assuntos
Linfócitos B , Benchmarking , Proteômica , Receptores de Antígenos de Linfócitos B , Análise de Célula Única , Humanos , Receptores de Antígenos de Linfócitos B/genética , Receptores de Antígenos de Linfócitos B/imunologia , Proteômica/métodos , Linfócitos B/imunologia , Análise de Célula Única/métodos , Anticorpos/imunologia , Anticorpos/genética , Genômica/métodos , Espectrometria de Massas em Tandem/métodos
13.
Genome Biol ; 25(1): 159, 2024 06 17.
Artigo em Inglês | MEDLINE | ID: mdl-38886757

RESUMO

BACKGROUND: The advent of single-cell RNA-sequencing (scRNA-seq) has driven significant computational methods development for all steps in the scRNA-seq data analysis pipeline, including filtering, normalization, and clustering. The large number of methods and their resulting parameter combinations has created a combinatorial set of possible pipelines to analyze scRNA-seq data, which leads to the obvious question: which is best? Several benchmarking studies compare methods but frequently find variable performance depending on dataset and pipeline characteristics. Alternatively, the large number of scRNA-seq datasets along with advances in supervised machine learning raise a tantalizing possibility: could the optimal pipeline be predicted for a given dataset? RESULTS: Here, we begin to answer this question by applying 288 scRNA-seq analysis pipelines to 86 datasets and quantifying pipeline success via a range of measures evaluating cluster purity and biological plausibility. We build supervised machine learning models to predict pipeline success given a range of dataset and pipeline characteristics. We find that prediction performance is significantly better than random and that in many cases pipelines predicted to perform well provide clustering outputs similar to expert-annotated cell type labels. We identify characteristics of datasets that correlate with strong prediction performance that could guide when such prediction models may be useful. CONCLUSIONS: Supervised machine learning models have utility for recommending analysis pipelines and therefore the potential to alleviate the burden of choosing from the near-infinite number of possibilities. Different aspects of datasets influence the predictive performance of such models which will further guide users.


Assuntos
Benchmarking , RNA-Seq , Análise de Célula Única , Análise de Célula Única/métodos , RNA-Seq/métodos , Humanos , Aprendizado de Máquina Supervisionado , Análise de Sequência de RNA/métodos , Análise por Conglomerados , Biologia Computacional/métodos , Aprendizado de Máquina , Animais , Análise da Expressão Gênica de Célula Única
14.
JCO Clin Cancer Inform ; 8: e2300174, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38870441

RESUMO

PURPOSE: The quality of radiotherapy auto-segmentation training data, primarily derived from clinician observers, is of utmost importance. However, the factors influencing the quality of clinician-derived segmentations are poorly understood; our study aims to quantify these factors. METHODS: Organ at risk (OAR) and tumor-related segmentations provided by radiation oncologists from the Contouring Collaborative for Consensus in Radiation Oncology data set were used. Segmentations were derived from five disease sites: breast, sarcoma, head and neck (H&N), gynecologic (GYN), and GI. Segmentation quality was determined on a structure-by-structure basis by comparing the observer segmentations with an expert-derived consensus, which served as a reference standard benchmark. The Dice similarity coefficient (DSC) was primarily used as a metric for the comparisons. DSC was stratified into binary groups on the basis of structure-specific expert-derived interobserver variability (IOV) cutoffs. Generalized linear mixed-effects models using Bayesian estimation were used to investigate the association between demographic variables and the binarized DSC for each disease site. Variables with a highest density interval excluding zero were considered to substantially affect the outcome measure. RESULTS: Five hundred seventy-four, 110, 452, 112, and 48 segmentations were used for the breast, sarcoma, H&N, GYN, and GI cases, respectively. The median percentage of segmentations that crossed the expert DSC IOV cutoff when stratified by structure type was 55% and 31% for OARs and tumors, respectively. Regression analysis revealed that the structure being tumor-related had a substantial negative impact on binarized DSC for the breast, sarcoma, H&N, and GI cases. There were no recurring relationships between segmentation quality and demographic variables across the cases, with most variables demonstrating large standard deviations. CONCLUSION: Our study highlights substantial uncertainty surrounding conventionally presumed factors influencing segmentation quality relative to benchmarks.


Assuntos
Teorema de Bayes , Benchmarking , Radio-Oncologistas , Humanos , Benchmarking/métodos , Feminino , Planejamento da Radioterapia Assistida por Computador/métodos , Neoplasias/epidemiologia , Neoplasias/radioterapia , Órgãos em Risco , Masculino , Radioterapia (Especialidade)/normas , Radioterapia (Especialidade)/métodos , Demografia , Variações Dependentes do Observador
15.
J Breath Res ; 18(4)2024 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-38876091

RESUMO

The Peppermint Initiative, established within the International Association of Breath Research, introduced the peppermint protocol, a breath analysis benchmarking effort designed to address the lack of inter-comparability of outcomes across different breath sampling techniques and analytical platforms. Benchmarking with gas chromatography-ion mobility spectrometry (GC-IMS) using peppermint has been previously reported however, coupling micro-thermal desorption (µTD) to GC-IMS has not yet, been benchmarked for breath analysis. To benchmarkµTD-GC-IMS for breath analysis using the peppermint protocol. Ten healthy participants (4 males and 6 females, aged 20-73 years), were enrolled to give six breath samples into Nalophan bags via a modified peppermint protocol. Breath sampling after peppermint ingestion occurred over 6 h att= 60, 120, 200, 280, and 360 min. The breath samples (120 cm3) were pre-concentrated in theµTD before being transferred into the GC-IMS for detection. Data was processed using VOCal, including background subtractions, peak volume measurements, and room air assessment. During peppermint washout, eucalyptol showed the highest change in concentration levels, followed byα-pinene andß-pinene. The reproducibility of the technique for breath analysis was demonstrated by constructing logarithmic washout curves, with the average linearity coefficient ofR2= 0.99. The time to baseline (benchmark) value for the eucalyptol washout was 1111 min (95% CI: 529-1693 min), obtained by extrapolating the average logarithmic washout curve. The study demonstrated thatµTD-GC-IMS is reproducible and suitable technique for breath analysis, with benchmark values for eucalyptol comparable to the gold standard GC-MS.


Assuntos
Benchmarking , Testes Respiratórios , Mentha piperita , Humanos , Testes Respiratórios/métodos , Testes Respiratórios/instrumentação , Feminino , Masculino , Adulto , Pessoa de Meia-Idade , Idoso , Espectrometria de Mobilidade Iônica/métodos , Espectrometria de Mobilidade Iônica/normas , Adulto Jovem , Cromatografia Gasosa-Espectrometria de Massas/métodos , Cromatografia Gasosa/métodos , Cromatografia Gasosa/instrumentação , Cromatografia Gasosa/normas
16.
Artigo em Inglês | MEDLINE | ID: mdl-38900611

RESUMO

In the context of neurorehabilitation, there have been rapid and continuous improvements in sensors-based clinical tools to quantify limb performance. As a result of the increasing integration of technologies in the assessment procedure, the need to integrate evidence-based medicine with benchmarking has emerged in the scientific community. In this work, we present the experimental validation of our previously proposed benchmarking scheme for upper limb capabilities in terms of repeatability, reproducibility, and clinical meaningfulness. We performed a prospective multicenter study on neurologically intact young and elderly subjects and post-stroke patients while recording kinematics and electromyography. 60 subjects (30 young healthy, 15 elderly healthy, and 15 post-stroke) completed the benchmarking protocol. The framework was repeatable among different assessors and instrumentation. Age did not significantly impact the performance indicators of the scheme for healthy subjects. In post-stroke subjects, the movements presented decreased smoothness and speed, the movement amplitude was reduced, and the muscular activation showed lower power and lower intra-limb coordination. We revised the original framework reducing it to three motor skills, and we extracted 14 significant performance indicators with a good correlation with the ARAT clinical scale. The applicability of the scheme is wide, and it may be considered a valuable tool for upper limb functional evaluation in the clinical routine.


Assuntos
Benchmarking , Eletromiografia , Reabilitação do Acidente Vascular Cerebral , Acidente Vascular Cerebral , Extremidade Superior , Humanos , Masculino , Feminino , Projetos Piloto , Reabilitação do Acidente Vascular Cerebral/métodos , Eletromiografia/métodos , Adulto , Extremidade Superior/fisiopatologia , Idoso , Pessoa de Meia-Idade , Reprodutibilidade dos Testes , Acidente Vascular Cerebral/complicações , Acidente Vascular Cerebral/fisiopatologia , Fenômenos Biomecânicos , Estudos Prospectivos , Adulto Jovem , Voluntários Saudáveis , Movimento/fisiologia , Destreza Motora/fisiologia , Algoritmos
17.
Nature ; 630(8018): 841-846, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38839963

RESUMO

The development of neural techniques has opened up new avenues for research in machine translation. Today, neural machine translation (NMT) systems can leverage highly multilingual capacities and even perform zero-shot translation, delivering promising results in terms of language coverage and quality. However, scaling quality NMT requires large volumes of parallel bilingual data, which are not equally available for the 7,000+ languages in the world1. Focusing on improving the translation qualities of a relatively small group of high-resource languages comes at the expense of directing research attention to low-resource languages, exacerbating digital inequities in the long run. To break this pattern, here we introduce No Language Left Behind-a single massively multilingual model that leverages transfer learning across languages. We developed a conditional computational model based on the Sparsely Gated Mixture of Experts architecture2-7, which we trained on data obtained with new mining techniques tailored for low-resource languages. Furthermore, we devised multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. We evaluated the performance of our model over 40,000 translation directions using tools created specifically for this purpose-an automatic benchmark (FLORES-200), a human evaluation metric (XSTS) and a toxicity detector that covers every language in our model. Compared with the previous state-of-the-art models, our model achieves an average of 44% improvement in translation quality as measured by BLEU. By demonstrating how to scale NMT to 200 languages and making all contributions in this effort freely available for non-commercial use, our work lays important groundwork for the development of a universal translation system.


Assuntos
Multilinguismo , Processamento de Linguagem Natural , Redes Neurais de Computação , Tradução , Benchmarking
18.
J Robot Surg ; 18(1): 271, 2024 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-38937307

RESUMO

We investigated the use of robotic objective performance metrics (OPM) to predict number of cases to proficiency and independence among abdominal transplant fellows performing robot-assisted donor nephrectomy (RDN). 101 RDNs were performed by 5 transplant fellows from September 2020 to October 2023. OPM included fellow percent active control time (%ACT) and handoff counts (HC). Proficiency was defined as ACT ≥ 80% and HC ≤ 2, and independence as ACT ≥ 99% and HC ≤ 1. Case number was significantly associated with increasing fellow %ACT, with proficiency estimated at 14 cases and independence at 32 cases (R2 = 0.56, p < 0.001). Similarly, case number was significantly associated with decreasing HC, with proficiency at 18 cases and independence at 33 cases (R2 = 0.29, p < 0.001). Case number was not associated with total active console time (p = 0.91). Patient demographics, operative characteristics, and outcomes were not associated with OPM, except for donor estimated blood loss (EBL), which positively correlated with HC. Abdominal transplant fellows demonstrated proficiency at 14-18 cases and independence at 32-33 cases. Total active console time remained unchanged, suggesting that increasing fellow autonomy does not impede operative efficiency. These findings may serve as a benchmark for training abdominal transplant surgery fellows independently and safely in RDN.


Assuntos
Competência Clínica , Doadores Vivos , Nefrectomia , Procedimentos Cirúrgicos Robóticos , Nefrectomia/métodos , Nefrectomia/educação , Humanos , Procedimentos Cirúrgicos Robóticos/educação , Procedimentos Cirúrgicos Robóticos/métodos , Feminino , Masculino , Transplante de Rim/métodos , Transplante de Rim/educação , Pessoa de Meia-Idade , Adulto , Benchmarking , Bolsas de Estudo
19.
BMC Health Serv Res ; 24(1): 770, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38943091

RESUMO

BACKGROUND: Current processes collecting cancer stage data in population-based cancer registries (PBCRs) lack standardisation, resulting in difficulty utilising diverse data sources and incomplete, low-quality data. Implementing a cancer staging tiered framework aims to improve stage collection and facilitate inter-PBCR benchmarking. OBJECTIVE: Demonstrate the application of a cancer staging tiered framework in the Western Australian Cancer Staging Project to establish a standardised method for collecting cancer stage at diagnosis data in PBCRs. METHODS: The tiered framework, developed in collaboration with a Project Advisory Group and applied to breast, colorectal, and melanoma cancers, provides business rules - procedures for stage collection. Tier 1 represents the highest staging level, involving complete American Joint Committee on Cancer (AJCC) tumour-node-metastasis (TNM) data collection and other critical staging information. Tier 2 (registry-derived stage) relies on supplementary data, including hospital admission data, to make assumptions based on data availability. Tier 3 (pathology stage) solely uses pathology reports. FINDINGS: The tiered framework promotes flexible utilisation of staging data, recognising various levels of data completeness. Tier 1 is suitable for all purposes, including clinical and epidemiological applications. Tiers 2 and 3 are recommended for epidemiological analysis alone. Lower tiers provide valuable insights into disease patterns, risk factors, and overall disease burden for public health planning and policy decisions. Capture of staging at each tier depends on data availability, with potential shifts to higher tiers as new data sources are acquired. CONCLUSIONS: The tiered framework offers a dynamic approach for PBCRs to record stage at diagnosis, promoting consistency in population-level staging data and enabling practical use for benchmarking across jurisdictions, public health planning, policy development, epidemiological analyses, and assessing cancer outcomes. Evolution with staging classifications and data variable changes will futureproof the tiered framework. Its adaptability fosters continuous refinement of data collection processes and encourages improvements in data quality.


Assuntos
Estadiamento de Neoplasias , Neoplasias , Sistema de Registros , Humanos , Austrália Ocidental/epidemiologia , Neoplasias/patologia , Neoplasias/diagnóstico , Neoplasias/epidemiologia , Coleta de Dados/métodos , Coleta de Dados/normas , Benchmarking
20.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38935068

RESUMO

BACKGROUND: We present a novel simulation method for generating connected differential expression signatures. Traditional methods have struggled with the lack of reliable benchmarking data and biases in drug-disease pair labeling, limiting the rigorous benchmarking of connectivity-based approaches. OBJECTIVE: Our aim is to develop a simulation method based on a statistical framework that allows for adjustable levels of parametrization, especially the connectivity, to generate a pair of interconnected differential signatures. This could help to address the issue of benchmarking data availability for connectivity-based drug repurposing approaches. METHODS: We first detailed the simulation process and how it reflected real biological variability and the interconnectedness of gene expression signatures. Then, we generated several datasets to enable the evaluation of different existing algorithms that compare differential expression signatures, providing insights into their performance and limitations. RESULTS: Our findings demonstrate the ability of our simulation to produce realistic data, as evidenced by correlation analyses and the log2 fold-change distribution of deregulated genes. Benchmarking reveals that methods like extreme cosine similarity and Pearson correlation outperform others in identifying connected signatures. CONCLUSION: Overall, our method provides a reliable tool for simulating differential expression signatures. The data simulated by our tool encompass a wide spectrum of possibilities to challenge and evaluate existing methods to estimate connectivity scores. This may represent a critical gap in connectivity-based drug repurposing research because reliable benchmarking data are essential for assessing and advancing in the development of new algorithms. The simulation tool is available as a R package (General Public License (GPL) license) at https://github.com/cgonzalez-gomez/cosimu.


Assuntos
Algoritmos , Benchmarking , Simulação por Computador , Descoberta de Drogas , Descoberta de Drogas/métodos , Humanos , Perfilação da Expressão Gênica/métodos , Biologia Computacional/métodos , Reposicionamento de Medicamentos/métodos , Transcriptoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA