Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 4.437
Filtrar
Más filtros

Intervalo de año de publicación
1.
Cell ; 179(3): 589-603, 2019 10 17.
Artículo en Inglés | MEDLINE | ID: mdl-31607513

RESUMEN

Genome-wide association studies (GWASs) have focused primarily on populations of European descent, but it is essential that diverse populations become better represented. Increasing diversity among study participants will advance our understanding of genetic architecture in all populations and ensure that genetic research is broadly applicable. To facilitate and promote research in multi-ancestry and admixed cohorts, we outline key methodological considerations and highlight opportunities, challenges, solutions, and areas in need of development. Despite the perception that analyzing genetic data from diverse populations is difficult, it is scientifically and ethically imperative, and there is an expanding analytical toolbox to do it well.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Técnicas de Genotipaje/métodos , Genética Humana/métodos , Exactitud de los Datos , Variación Genética , Genética de Población/métodos , Genética de Población/normas , Estudio de Asociación del Genoma Completo/normas , Técnicas de Genotipaje/normas , Genética Humana/normas , Humanos , Linaje
2.
Proc Natl Acad Sci U S A ; 121(5): e2309384121, 2024 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-38252835

RESUMEN

High-quality specimen preparation plays a crucial role in cryo-electron microscopy (cryo-EM) structural analysis. In this study, we have developed a reliable and convenient technique called the graphene sandwich method for preparing cryo-EM specimens. This method involves using two layers of graphene films that enclose macromolecules on both sides, allowing for an appropriate ice thickness for cryo-EM analysis. The graphene sandwich helps to mitigate beam-induced charging effect and reduce particle motion compared to specimens prepared using the traditional method with graphene support on only one side, therefore improving the cryo-EM data quality. These advancements may open new opportunities to expand the use of graphene in the field of biological electron microscopy.


Asunto(s)
Grafito , Microscopía por Crioelectrón , Exactitud de los Datos , Movimiento (Física)
3.
Proc Natl Acad Sci U S A ; 121(34): e2402267121, 2024 Aug 20.
Artículo en Inglés | MEDLINE | ID: mdl-39136986

RESUMEN

Despite ethical and historical arguments for removing race from clinical algorithms, the consequences of removal remain unclear. Here, we highlight a largely undiscussed consideration in this debate: varying data quality of input features across race groups. For example, family history of cancer is an essential predictor in cancer risk prediction algorithms but is less reliably documented for Black participants and may therefore be less predictive of cancer outcomes. Using data from the Southern Community Cohort Study, we assessed whether race adjustments could allow risk prediction models to capture varying data quality by race, focusing on colorectal cancer risk prediction. We analyzed 77,836 adults with no history of colorectal cancer at baseline. The predictive value of self-reported family history was greater for White participants than for Black participants. We compared two cancer risk prediction algorithms-a race-blind algorithm which included standard colorectal cancer risk factors but not race, and a race-adjusted algorithm which additionally included race. Relative to the race-blind algorithm, the race-adjusted algorithm improved predictive performance, as measured by goodness of fit in a likelihood ratio test (P-value: <0.001) and area under the receiving operating characteristic curve among Black participants (P-value: 0.006). Because the race-blind algorithm underpredicted risk for Black participants, the race-adjusted algorithm increased the fraction of Black participants among the predicted high-risk group, potentially increasing access to screening. More broadly, this study shows that race adjustments may be beneficial when the data quality of key predictors in clinical algorithms differs by race group.


Asunto(s)
Algoritmos , Neoplasias Colorrectales , Humanos , Neoplasias Colorrectales/diagnóstico , Neoplasias Colorrectales/etnología , Neoplasias Colorrectales/epidemiología , Masculino , Femenino , Persona de Mediana Edad , Exactitud de los Datos , Población Blanca/estadística & datos numéricos , Negro o Afroamericano/estadística & datos numéricos , Factores de Riesgo , Anciano , Adulto , Estudios de Cohortes , Grupos Raciales/estadística & datos numéricos , Medición de Riesgo/métodos
4.
PLoS Biol ; 21(11): e3002345, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37910647

RESUMEN

Upon completion of an experiment, if a trend is observed that is "not quite significant," it can be tempting to collect more data in an effort to achieve statistical significance. Such sample augmentation or "N-hacking" is condemned because it can lead to an excess of false positives, which can reduce the reproducibility of results. However, the scenarios used to prove this rule tend to be unrealistic, assuming the addition of unlimited extra samples to achieve statistical significance, or doing so when results are not even close to significant; an unlikely situation for most experiments involving patient samples, cultured cells, or live animals. If we were to examine some more realistic scenarios, could there be any situations where N-hacking might be an acceptable practice? This Essay aims to address this question, using simulations to demonstrate how N-hacking causes false positives and to investigate whether this increase is still relevant when using parameters based on real-life experimental settings.


Asunto(s)
Exactitud de los Datos , Proyectos de Investigación , Reproducibilidad de los Resultados , Proyectos de Investigación/normas
5.
EMBO J ; 40(3): e105889, 2021 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-33480052

RESUMEN

Image data are universal in life sciences research. Their proper handling is not. A significant proportion of image data in research papers show signs of mishandling that undermine their interpretation. We propose that a precise description of the image processing and analysis applied is required to address this problem. A new norm for reporting reproducible image analyses will diminish mishandling, as it will alert co-authors, referees, and journals to aberrant image data processing or, if published nonetheless, it will document it to the reader. To promote this norm, we discuss the effectiveness of this approach and give some step-by-step instructions for publishing reproducible image data processing and analysis workflows.


Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Procesamiento de Imagen Asistido por Computador/normas , Edición/normas , Exactitud de los Datos , Humanos , Reproducibilidad de los Resultados , Mala Conducta Científica , Flujo de Trabajo
6.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38189543

RESUMEN

Recently, attention mechanism and derived models have gained significant traction in drug development due to their outstanding performance and interpretability in handling complex data structures. This review offers an in-depth exploration of the principles underlying attention-based models and their advantages in drug discovery. We further elaborate on their applications in various aspects of drug development, from molecular screening and target binding to property prediction and molecule generation. Finally, we discuss the current challenges faced in the application of attention mechanisms and Artificial Intelligence technologies, including data quality, model interpretability and computational resource constraints, along with future directions for research. Given the accelerating pace of technological advancement, we believe that attention-based models will have an increasingly prominent role in future drug discovery. We anticipate that these models will usher in revolutionary breakthroughs in the pharmaceutical domain, significantly accelerating the pace of drug development.


Asunto(s)
Inteligencia Artificial , Descubrimiento de Drogas , Desarrollo de Medicamentos , Exactitud de los Datos
7.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36562715

RESUMEN

As one of the most vital methods in drug development, drug repositioning emphasizes further analysis and research of approved drugs based on the existing large amount of clinical and experimental data to identify new indications of drugs. However, the existing drug repositioning methods didn't achieve enough prediction performance, and these methods do not consider the effectiveness information of drugs, which make it difficult to obtain reliable and valuable results. In this study, we proposed a drug repositioning framework termed DRONet, which make full use of effectiveness comparative relationships (ECR) among drugs as prior information by combining network embedding and ranking learning. We utilized network embedding methods to learn the deep features of drugs from a heterogeneous drug-disease network, and constructed a high-quality drug-indication data set including effectiveness-based drug contrast relationships. The embedding features and ECR of drugs are combined effectively through a designed ranking learning model to prioritize candidate drugs. Comprehensive experiments show that DRONet has higher prediction accuracy (improving 87.4% on Hit@1 and 37.9% on mean reciprocal rank) than state of the art. The case analysis also demonstrates high reliability of predicted results, which has potential to guide clinical drug development.


Asunto(s)
Biología Computacional , Reposicionamiento de Medicamentos , Biología Computacional/métodos , Reposicionamiento de Medicamentos/métodos , Reproducibilidad de los Resultados , Exactitud de los Datos , Algoritmos
8.
Nucleic Acids Res ; 51(D1): D571-D582, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36305834

RESUMEN

Ferroptosis is a mode of regulated cell death characterized by iron-dependent accumulation of lipid peroxidation. It is closely linked to the pathophysiological processes in many diseases. Since our publication of the first ferroptosis database in 2020 (FerrDb V1), many new findings have been published. To keep up with the rapid progress in ferroptosis research and to provide timely and high-quality data, here we present the successor, FerrDb V2. It contains 1001 ferroptosis regulators and 143 ferroptosis-disease associations manually curated from 3288 articles. Specifically, there are 621 gene regulators, of which 264 are drivers, 238 are suppressors, 9 are markers, and 110 are unclassified genes; and there are 380 substance regulators, with 201 inducers and 179 inhibitors. Compared to FerrDb V1, curated articles increase by >300%, ferroptosis regulators increase by 175%, and ferroptosis-disease associations increase by 50.5%. Circular RNA and pseudogene are novel regulators in FerrDb V2, and the percentage of non-coding RNA increases from 7.3% to 13.6%. External gene-related data were integrated, enabling thought-provoking and gene-oriented analysis in FerrDb V2. In conclusion, FerrDb V2 will help to acquire deeper insights into ferroptosis. FerrDb V2 is freely accessible at http://www.zhounan.org/ferrdb/.


Asunto(s)
Ferroptosis , Ferroptosis/genética , Exactitud de los Datos , Bases de Datos Factuales , Peroxidación de Lípido , Seudogenes
9.
Proc Natl Acad Sci U S A ; 119(15): e2113561119, 2022 04 12.
Artículo en Inglés | MEDLINE | ID: mdl-35394862

RESUMEN

Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forecasting models provide specific, quantitative, and evaluable predictions that inform short-term decisions such as healthcare staffing needs, school closures, and allocation of medical supplies. Starting in April 2020, the US COVID-19 Forecast Hub (https://covid19forecasthub.org/) collected, disseminated, and synthesized tens of millions of specific predictions from more than 90 different academic, industry, and independent research groups. A multimodel ensemble forecast that combined predictions from dozens of groups every week provided the most consistently accurate probabilistic forecasts of incident deaths due to COVID-19 at the state and national level from April 2020 through October 2021. The performance of 27 individual models that submitted complete forecasts of COVID-19 deaths consistently throughout this year showed high variability in forecast skill across time, geospatial units, and forecast horizons. Two-thirds of the models evaluated showed better accuracy than a naïve baseline model. Forecast accuracy degraded as models made predictions further into the future, with probabilistic error at a 20-wk horizon three to five times larger than when predicting at a 1-wk horizon. This project underscores the role that collaboration and active coordination between governmental public-health agencies, academic modeling teams, and industry partners can play in developing modern modeling capabilities to support local, state, and federal response to outbreaks.


Asunto(s)
COVID-19 , COVID-19/mortalidad , Exactitud de los Datos , Predicción , Humanos , Pandemias , Probabilidad , Salud Pública/tendencias , Estados Unidos/epidemiología
10.
Proc Natl Acad Sci U S A ; 119(43): e2109313118, 2022 10 25.
Artículo en Inglés | MEDLINE | ID: mdl-36251987

RESUMEN

Investments in data management infrastructure often seek to catalyze new research outcomes based on the reuse of research data. To achieve the goals of these investments, we need to better understand how data creation and data quality concerns shape the potential reuse of data. The primary audience for this paper centers on scientific domain specialists that create and (re)use datasets documenting archaeological materials. This paper discusses practices that promote data quality in support of more open-ended reuse of data beyond the immediate needs of the creators. We argue that identifier practices play a key, but poorly recognized, role in promoting data quality and reusability. We use specific archaeological examples to demonstrate how the use of globally unique and persistent identifiers can communicate aspects of context, avoid errors and misinterpretations, and facilitate integration and reuse. We then discuss the responsibility of data creators and data reusers to employ identifiers to better maintain the contextual integrity of data, including professional, social, and ethical dimensions.


Asunto(s)
Arqueología , Exactitud de los Datos
11.
J Infect Dis ; 229(Supplement_2): S163-S171, 2024 Mar 26.
Artículo en Inglés | MEDLINE | ID: mdl-37968965

RESUMEN

BACKGROUND: In response to Mpox endemic and public health emergency, DCHHS aimed to develop NGS based techniques to streamline Mpox viral clade and lineage analysis. METHODS: The Mpox sequencing workflow started with DNA extraction and adapted Illumina's COVIDSeq assay using hMpox primer pools from Yale School of Public Health. Sequencing steps included cDNA amplification, tagmentation, PCR indexing, pooling libraries, sequencing on MiSeq, data analysis, and report generation. The bioinformatic analysis comprised read assembly and consensus sequence mapping to reference genomes and variant identification, and utilized pipelines including Illumina BaseSpace, NextClade, CLC Workbench, Terra.bio for data quality control (QC) and validation. RESULTS: In total, 171 mpox samples were sequenced using modified COVIDSeq workflow and QC metrics were assessed for read quality, depth, and coverage. Multiple analysis pipelines identified the West African clade IIb as the only clade during peak Mpox infection from July through October 2022. Analyses also indicated lineage B.1.2 as the dominant variant comprising the majority of Mpox viral genomes (77.7%), implying its geographical distribution in the United States. Viral sequences were uploaded to GISAID EpiPox. CONCLUSIONS: We developed NGS workflows to precisely detect and analyze mpox viral clade and lineages aiding in public health genomic surveillance.


Asunto(s)
Mpox , Humanos , Genómica/métodos , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Exactitud de los Datos
12.
J Proteome Res ; 23(6): 1926-1936, 2024 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-38691771

RESUMEN

Data-independent acquisition has seen breakthroughs that enable comprehensive proteome profiling using short gradients. As the proteome coverage continues to increase, the quality of the data generated becomes much more relevant. Using Spectronaut, we show that the default search parameters can be easily optimized to minimize the occurrence of false positives across different samples. Using an immunological infection model system to demonstrate the impact of adjusting search settings, we analyzed Mus musculus macrophages and compared their proteome to macrophages spiked withCandida albicans. This experimental system enabled the identification of "false positives" as Candida albicans peptides and proteins should not be present in the Mus musculus-only samples. We show that adjusting the search parameters reduced "false positive" identifications by 89% at the peptide and protein level, thereby considerably increasing the quality of the data. We also show that these optimized parameters incurred a moderate cost, only reducing the overall number of "true positive" identifications across each biological replicate by <6.7% at both the peptide and protein level. We believe the value of our updated search parameters extends beyond a two-organism analysis and would be of great value to any DIA experiment analyzing heterogeneous populations of cell types or tissues.


Asunto(s)
Candida albicans , Macrófagos , Proteoma , Proteómica , Animales , Ratones , Proteoma/análisis , Proteómica/métodos , Macrófagos/metabolismo , Macrófagos/inmunología , Exactitud de los Datos , Péptidos/análisis
13.
Int J Cancer ; 154(5): 816-829, 2024 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-37860893

RESUMEN

Adolescent and young adults (AYA) with germ cell tumours (GCT) have poorer survival rates than children and many older adults with the same cancers. There are several likely contributing factors to this, including the treatment received. The prognostic benefit of intended dose intensity is well documented in GCT from trials comparing regimens. However, evidence specific to AYA is limited by poor recruitment of AYA to trials and dose delivery outside trials not being well examined. We examined the utility of cancer registration data and a clinical trials dataset to investigate the delivery of relative dose intensity (RDI) in routine National Health Service practice in England, compared to within international clinical trials. Linked data from the Cancer Outcomes and Services Dataset (COSD) and the Systemic Anti-Cancer Therapy (SACT) dataset, and data from four international clinical trials were analysed. Survival over time was described using Kaplan-Meier estimation; overall, by age category, International Germ-Cell Cancer Collaborative Group (IGCCCG) classification, stage, tumour subtype, primary site, ethnicity and deprivation. Cox regression models were used to determine the fully adjusted effect of RDI on mortality risk. The quality of both datasets was critically evaluated and clinically enhanced. RDI was found to be well maintained in all datasets with higher RDIs associated with improved survival outcomes. Real-world data demonstrated several strengths, including population coverage and inclusion of sociodemographic variables and comorbidity. It is limited in GCT however, by the poor completion of data items enabling risk classification of patients and a higher proportion of missing data.


Asunto(s)
Neoplasias de Células Germinales y Embrionarias , Neoplasias , Niño , Humanos , Adolescente , Adulto Joven , Anciano , Exactitud de los Datos , Medicina Estatal , Neoplasias/epidemiología , Neoplasias de Células Germinales y Embrionarias/epidemiología , Pronóstico
14.
Anal Chem ; 96(3): 1064-1072, 2024 01 23.
Artículo en Inglés | MEDLINE | ID: mdl-38179935

RESUMEN

The implementation of quality control strategies is crucial to ensure the reproducibility, accuracy, and meaningfulness of metabolomics data. However, this pivotal step is often overlooked within the metabolomics workflow and frequently relies on the use of nonstandardized and poorly reported protocols. To address current limitations in this respect, we have developed QComics, a robust, easily implementable and reportable method for monitoring and controlling data quality. The protocol operates in various sequential steps aimed to (i) correct for background noise and carryover, (ii) detect signal drifts and "out-of-control" observations, (iii) deal with missing data, (iv) remove outliers, (v) monitor quality markers to identify samples affected by improper collection, preprocessing, or storage, and (vi) assess overall data quality in terms of precision and accuracy. Notably, this tool considers important issues often neglected along quality control, such as the need of separately handling missing values and truly absent data to avoid losing relevant biological information, as well as the large impact that preanalytical factors may elicit on metabolomics results. Altogether, the guidelines compiled in QComics might contribute to establishing gold standard recommendations and best practices for quality control within the metabolomics community.


Asunto(s)
Exactitud de los Datos , Metabolómica , Reproducibilidad de los Resultados , Metabolómica/métodos , Control de Calidad , Flujo de Trabajo
15.
Mod Pathol ; 37(1): 100369, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37890670

RESUMEN

Generative adversarial networks (GANs) have gained significant attention in the field of image synthesis, particularly in computer vision. GANs consist of a generative model and a discriminative model trained in an adversarial setting to generate realistic and novel data. In the context of image synthesis, the generator produces synthetic images, whereas the discriminator determines their authenticity by comparing them with real examples. Through iterative training, the generator allows the creation of images that are indistinguishable from real ones, leading to high-quality image generation. Considering their success in computer vision, GANs hold great potential for medical diagnostic applications. In the medical field, GANs can generate images of rare diseases, aid in learning, and be used as visualization tools. GANs can leverage unlabeled medical images, which are large in size, numerous in quantity, and challenging to annotate manually. GANs have demonstrated remarkable capabilities in image synthesis and have the potential to significantly impact digital histopathology. This review article focuses on the emerging use of GANs in digital histopathology, examining their applications and potential challenges. Histopathology plays a crucial role in disease diagnosis, and GANs can contribute by generating realistic microscopic images. However, ethical considerations arise because of the reliance on synthetic or pseudogenerated images. Therefore, the manuscript also explores the current limitations and highlights the ethical considerations associated with the use of this technology. In conclusion, digital histopathology has seen an emerging use of GANs for image enhancement, such as color (stain) normalization, virtual staining, and ink/marker removal. GANs offer significant potential in transforming digital pathology when applied to specific and narrow tasks (preprocessing enhancements). Evaluating data quality, addressing biases, protecting privacy, ensuring accountability and transparency, and developing regulation are imperative to ensure the ethical application of GANs.


Asunto(s)
Colorantes , Exactitud de los Datos , Humanos , Coloración y Etiquetado , Procesamiento de Imagen Asistido por Computador
16.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36383167

RESUMEN

MOTIVATION: Single-cell multi-omics sequencing techniques have rapidly developed in the past few years. Clustering analysis with single-cell multi-omics data may give us novel perspectives to dissect cellular heterogeneity. However, multi-omics data have the properties of inherited large dimension, high sparsity and existence of doublets. Moreover, representations of different omics from even the same cell follow diverse distributions. Without proper distribution alignment techniques, clustering methods will encounter less separable clusters easily affected by less informative omics data. RESULTS: We developed MoClust, a novel joint clustering framework that can be applied to several types of single-cell multi-omics data. A selective automatic doublet detection module that can identify and filter out doublets is introduced in the pretraining stage to improve data quality. Omics-specific autoencoders are introduced to characterize the multi-omics data. A contrastive learning way of distribution alignment is adopted to adaptively fuse omics representations into an omics-invariant representation. This novel way of alignment boosts the compactness and separableness of clusters, while accurately weighting the contribution of each omics to the clustering object. Extensive experiments, over both simulated and real multi-omics datasets, demonstrated the powerful alignment, doublet detection and clustering ability features of MoClust. AVAILABILITY AND IMPLEMENTATION: An implementation of MoClust is available from https://doi.org/10.5281/zenodo.7306504. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Exactitud de los Datos , Multiómica , Análisis por Conglomerados
17.
Bioinformatics ; 39(39 Suppl 1): i111-i120, 2023 06 30.
Artículo en Inglés | MEDLINE | ID: mdl-37387181

RESUMEN

MOTIVATION: Transcriptomics data are becoming more accessible due to high-throughput and less costly sequencing methods. However, data scarcity prevents exploiting deep learning models' full predictive power for phenotypes prediction. Artificially enhancing the training sets, namely data augmentation, is suggested as a regularization strategy. Data augmentation corresponds to label-invariant transformations of the training set (e.g. geometric transformations on images and syntax parsing on text data). Such transformations are, unfortunately, unknown in the transcriptomic field. Therefore, deep generative models such as generative adversarial networks (GANs) have been proposed to generate additional samples. In this article, we analyze GAN-based data augmentation strategies with respect to performance indicators and the classification of cancer phenotypes. RESULTS: This work highlights a significant boost in binary and multiclass classification performances due to augmentation strategies. Without augmentation, training a classifier on only 50 RNA-seq samples yields an accuracy of, respectively, 94% and 70% for binary and tissue classification. In comparison, we achieved 98% and 94% of accuracy when adding 1000 augmented samples. Richer architectures and more expensive training of the GAN return better augmentation performances and generated data quality overall. Further analysis of the generated data shows that several performance indicators are needed to assess its quality correctly. AVAILABILITY AND IMPLEMENTATION: All data used for this research are publicly available and comes from The Cancer Genome Atlas. Reproducible code is available on the GitLab repository: https://forge.ibisc.univ-evry.fr/alacan/GANs-for-transcriptomics.


Asunto(s)
Perfilación de la Expresión Génica , Transcriptoma , RNA-Seq , Exactitud de los Datos , Fenotipo
18.
Bioinformatics ; 39(9)2023 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-37647640

RESUMEN

MOTIVATION: Existing methods for simulating synthetic genotype and phenotype datasets have limited scalability, constraining their usability for large-scale analyses. Moreover, a systematic approach for evaluating synthetic data quality and a benchmark synthetic dataset for developing and evaluating methods for polygenic risk scores are lacking. RESULTS: We present HAPNEST, a novel approach for efficiently generating diverse individual-level genotypic and phenotypic data. In comparison to alternative methods, HAPNEST shows faster computational speed and a lower degree of relatedness with reference panels, while generating datasets that preserve key statistical properties of real data. These desirable synthetic data properties enabled us to generate 6.8 million common variants and nine phenotypes with varying degrees of heritability and polygenicity across 1 million individuals. We demonstrate how HAPNEST can facilitate biobank-scale analyses through the comparison of seven methods to generate polygenic risk scoring across multiple ancestry groups and different genetic architectures. AVAILABILITY AND IMPLEMENTATION: A synthetic dataset of 1 008 000 individuals and nine traits for 6.8 million common variants is available at https://www.ebi.ac.uk/biostudies/studies/S-BSST936. The HAPNEST software for generating synthetic datasets is available as Docker/Singularity containers and open source Julia and C code at https://github.com/intervene-EU-H2020/synthetic_data.


Asunto(s)
Benchmarking , Exactitud de los Datos , Humanos , Genotipo , Fenotipo , Herencia Multifactorial
19.
Ann Rheum Dis ; 83(1): 112-120, 2024 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-37907255

RESUMEN

OBJECTIVES: This study aims to describe the data structure and harmonisation process, explore data quality and define characteristics, treatment, and outcomes of patients across six federated antineutrophil cytoplasmic antibody-associated vasculitis (AAV) registries. METHODS: Through creation of the vasculitis-specific Findable, Accessible, Interoperable, Reusable, VASCulitis ontology, we harmonised the registries and enabled semantic interoperability. We assessed data quality across the domains of uniqueness, consistency, completeness and correctness. Aggregated data were retrieved using the semantic query language SPARQL Protocol and Resource Description Framework Query Language (SPARQL) and outcome rates were assessed through random effects meta-analysis. RESULTS: A total of 5282 cases of AAV were identified. Uniqueness and data-type consistency were 100% across all assessed variables. Completeness and correctness varied from 49%-100% to 60%-100%, respectively. There were 2754 (52.1%) cases classified as granulomatosis with polyangiitis (GPA), 1580 (29.9%) as microscopic polyangiitis and 937 (17.7%) as eosinophilic GPA. The pattern of organ involvement included: lung in 3281 (65.1%), ear-nose-throat in 2860 (56.7%) and kidney in 2534 (50.2%). Intravenous cyclophosphamide was used as remission induction therapy in 982 (50.7%), rituximab in 505 (17.7%) and pulsed intravenous glucocorticoid use was highly variable (11%-91%). Overall mortality and incidence rates of end-stage kidney disease were 28.8 (95% CI 19.7 to 42.2) and 24.8 (95% CI 19.7 to 31.1) per 1000 patient-years, respectively. CONCLUSIONS: In the largest reported AAV cohort-study, we federated patient registries using semantic web technologies and highlighted concerns about data quality. The comparison of patient characteristics, treatment and outcomes was hampered by heterogeneous recruitment settings.


Asunto(s)
Vasculitis Asociada a Anticuerpos Citoplasmáticos Antineutrófilos , Granulomatosis con Poliangitis , Poliangitis Microscópica , Humanos , Granulomatosis con Poliangitis/tratamiento farmacológico , Granulomatosis con Poliangitis/epidemiología , Granulomatosis con Poliangitis/complicaciones , Exactitud de los Datos , Vasculitis Asociada a Anticuerpos Citoplasmáticos Antineutrófilos/tratamiento farmacológico , Vasculitis Asociada a Anticuerpos Citoplasmáticos Antineutrófilos/epidemiología , Vasculitis Asociada a Anticuerpos Citoplasmáticos Antineutrófilos/complicaciones , Poliangitis Microscópica/tratamiento farmacológico , Poliangitis Microscópica/epidemiología , Anticuerpos Anticitoplasma de Neutrófilos , Sistema de Registros , Almacenamiento y Recuperación de la Información
20.
Ann Surg Oncol ; 31(9): 5546-5559, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-38717542

RESUMEN

BACKGROUND: Standardization of procedures for data abstraction by cancer registries is fundamental for cancer surveillance, clinical and policy decision-making, hospital benchmarking, and research efforts. The objective of the current study was to evaluate adherence to the four components (completeness, comparability, timeliness, and validity) defined by Bray and Parkin that determine registries' ability to carry out these activities to the hospital-based National Cancer Database (NCDB). METHODS: Tbis study used data from U.S. Cancer Statistics, the official federal cancer statistics and joint effort between the Centers for Disease Control and Prevention (CDC) and the National Cancer Institute (NCI), which includes data from National Program of Cancer Registries (NPCR) and Surveillance, Epidemiology, and End Results (SEER) to evaluate NCDB completeness between 2016 and 2020. The study evaluated comparability of case identification and coding procedures. It used Commission on Cancer (CoC) standards from 2022 to assess timeliness and validity. RESULTS: Completeness was demonstrated with a total of 6,828,507 cases identified within the NCDB, representing 73.7% of all cancer cases nationwide. Comparability was followed using standardized and international guidelines on coding and classification procedures. For timeliness, hospital compliance with timely data submission was 92.7%. Validity criteria for re-abstracting, recording, and reliability procedures across hospitals demonstrated 94.2% compliance. Additionally, data validity was shown by a 99.1% compliance with histologic verification standards, a 93.6% assessment of pathologic synoptic reporting, and a 99.1% internal consistency of staff credentials. CONCLUSION: The NCDB is characterized by a high level of case completeness and comparability with uniform standards for data collection, and by hospitals with high compliance, timely data submission, and high rates of compliance with validity standards for registry and data quality evaluation.


Asunto(s)
Exactitud de los Datos , Bases de Datos Factuales , Neoplasias , Sistema de Registros , Humanos , Sistema de Registros/normas , Sistema de Registros/estadística & datos numéricos , Neoplasias/epidemiología , Estados Unidos , Bases de Datos Factuales/normas , Programa de VERF/normas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA