Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 235
Filtrar
1.
Nature ; 596(7873): 590-596, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34293799

RESUMEN

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure1. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold2, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.


Asunto(s)
Biología Computacional/normas , Aprendizaje Profundo/normas , Modelos Moleculares , Conformación Proteica , Proteoma/química , Conjuntos de Datos como Asunto/normas , Diacilglicerol O-Acetiltransferasa/química , Glucosa-6-Fosfatasa/química , Humanos , Proteínas de la Membrana/química , Pliegue de Proteína , Reproducibilidad de los Resultados
2.
Nature ; 600(7890): 695-700, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34880504

RESUMEN

Surveys are a crucial tool for understanding public opinion and behaviour, and their accuracy depends on maintaining statistical representativeness of their target populations by minimizing biases from all sources. Increasing data size shrinks confidence intervals but magnifies the effect of survey bias: an instance of the Big Data Paradox1. Here we demonstrate this paradox in estimates of first-dose COVID-19 vaccine uptake in US adults from 9 January to 19 May 2021 from two large surveys: Delphi-Facebook2,3 (about 250,000 responses per week) and Census Household Pulse4 (about 75,000 every two weeks). In May 2021, Delphi-Facebook overestimated uptake by 17 percentage points (14-20 percentage points with 5% benchmark imprecision) and Census Household Pulse by 14 (11-17 percentage points with 5% benchmark imprecision), compared to a retroactively updated benchmark the Centers for Disease Control and Prevention published on 26 May 2021. Moreover, their large sample sizes led to miniscule margins of error on the incorrect estimates. By contrast, an Axios-Ipsos online panel5 with about 1,000 responses per week following survey research best practices6 provided reliable estimates and uncertainty quantification. We decompose observed error using a recent analytic framework1 to explain the inaccuracy in the three surveys. We then analyse the implications for vaccine hesitancy and willingness. We show how a survey of 250,000 respondents can produce an estimate of the population mean that is no more accurate than an estimate from a simple random sample of size 10. Our central message is that data quality matters more than data quantity, and that compensating the former with the latter is a mathematically provable losing proposition.


Asunto(s)
Vacunas contra la COVID-19/administración & dosificación , Encuestas de Atención de la Salud , Vacunación/estadística & datos numéricos , Benchmarking , Sesgo , Macrodatos , COVID-19/epidemiología , COVID-19/prevención & control , Centers for Disease Control and Prevention, U.S. , Conjuntos de Datos como Asunto/normas , Femenino , Encuestas de Atención de la Salud/normas , Humanos , Masculino , Proyectos de Investigación , Tamaño de la Muestra , Medios de Comunicación Sociales , Estados Unidos/epidemiología , Vacilación a la Vacunación/estadística & datos numéricos
3.
Nature ; 571(7765): 393-397, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-31316195

RESUMEN

Existing estimates of sea surface temperatures (SSTs) indicate that, during the early twentieth century, the North Atlantic and northeast Pacific oceans warmed by twice the global average, whereas the northwest Pacific Ocean cooled by an amount equal to the global average1-4. Such a heterogeneous pattern suggests first-order contributions from regional variations in forcing or in ocean-atmosphere heat fluxes5,6. These older SST estimates are, however, derived from measurements of water temperatures in ship-board buckets, and must be corrected for substantial biases7-9. Here we show that correcting for offsets among groups of bucket measurements leads to SST variations that correlate better with nearby land temperatures and are more homogeneous in their pattern of warming. Offsets are identified by systematically comparing nearby SST observations among different groups10. Correcting for offsets in German measurements decreases warming rates in the North Atlantic, whereas correcting for Japanese measurement offsets leads to increased and more uniform warming in the North Pacific. Japanese measurement offsets in the 1930s primarily result from records having been truncated to whole degrees Celsius when the records were digitized in the 1960s. These findings underscore the fact that historical SST records reflect both physical and social dimensions in data collection, and suggest that further opportunities exist for improving the accuracy of historical SST records9,11.


Asunto(s)
Conjuntos de Datos como Asunto/normas , Calentamiento Global/estadística & datos numéricos , Agua de Mar/análisis , Temperatura , Aire/análisis , Océano Atlántico , Conjuntos de Datos como Asunto/historia , Mapeo Geográfico , Alemania , Calentamiento Global/historia , Historia del Siglo XX , Japón , Océano Pacífico , Reproducibilidad de los Resultados
5.
Histopathology ; 85(3): 418-436, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-38719547

RESUMEN

BACKGROUND AND OBJECTIVES: Current national or regional guidelines for the pathology reporting on invasive breast cancer differ in certain aspects, resulting in divergent reporting practice and a lack of comparability of data. Here we report on a new international dataset for the pathology reporting of resection specimens with invasive cancer of the breast. The dataset was produced under the auspices of the International Collaboration on Cancer Reporting (ICCR), a global alliance of major (inter-)national pathology and cancer organizations. METHODS AND RESULTS: The established ICCR process for dataset development was followed. An international expert panel consisting of breast pathologists, a surgeon, and an oncologist prepared a draft set of core and noncore data items based on a critical review and discussion of current evidence. Commentary was provided for each data item to explain the rationale for selecting it as a core or noncore element, its clinical relevance, and to highlight potential areas of disagreement or lack of evidence, in which case a consensus position was formulated. Following international public consultation, the document was finalized and ratified, and the dataset, which includes a synoptic reporting guide, was published on the ICCR website. CONCLUSIONS: This first international dataset for invasive cancer of the breast is intended to promote high-quality, standardized pathology reporting. Its widespread adoption will improve consistency of reporting, facilitate multidisciplinary communication, and enhance comparability of data, all of which will help to improve the management of invasive breast cancer patients.


Asunto(s)
Neoplasias de la Mama , Humanos , Neoplasias de la Mama/patología , Femenino , Patología Clínica/normas , Conjuntos de Datos como Asunto/normas
6.
Am J Hum Genet ; 106(5): 679-693, 2020 05 07.
Artículo en Inglés | MEDLINE | ID: mdl-32330416

RESUMEN

Accurate construction of polygenic scores (PGS) can enable early diagnosis of diseases and facilitate the development of personalized medicine. Accurate PGS construction requires prediction models that are both adaptive to different genetic architectures and scalable to biobank scale datasets with millions of individuals and tens of millions of genetic variants. Here, we develop such a method called Deterministic Bayesian Sparse Linear Mixed Model (DBSLMM). DBSLMM relies on a flexible modeling assumption on the effect size distribution to achieve robust and accurate prediction performance across a range of genetic architectures. DBSLMM also relies on a simple deterministic search algorithm to yield an approximate analytic estimation solution using summary statistics only. The deterministic search algorithm, when paired with further algebraic innovations, results in substantial computational savings. With simulations, we show that DBSLMM achieves scalable and accurate prediction performance across a range of realistic genetic architectures. We then apply DBSLMM to analyze 25 traits in UK Biobank. For these traits, compared to existing approaches, DBSLMM achieves an average of 2.03%-101.09% accuracy gain in internal cross-validations. In external validations on two separate datasets, including one from BioBank Japan, DBSLMM achieves an average of 14.74%-522.74% accuracy gain. In these real data applications, DBSLMM is 1.03-28.11 times faster and uses only 7.4%-24.8% of physical memory as compared to other multiple regression-based PGS methods. Overall, DBSLMM represents an accurate and scalable method for constructing PGS in biobank scale datasets.


Asunto(s)
Bases de Datos Factuales/normas , Conjuntos de Datos como Asunto/normas , Herencia Multifactorial , Teorema de Bayes , Femenino , Humanos , Modelos Lineales , Masculino , Polimorfismo de Nucleótido Simple , Reproducibilidad de los Resultados , Tamaño de la Muestra , Reino Unido , Población Blanca/genética
7.
Am J Hum Genet ; 106(6): 846-858, 2020 06 04.
Artículo en Inglés | MEDLINE | ID: mdl-32470372

RESUMEN

The burden of several common diseases including obesity, diabetes, hypertension, asthma, and depression is increasing in most world populations. However, the mechanisms underlying the numerous epidemiological and genetic correlations among these disorders remain largely unknown. We investigated whether common polymorphic inversions underlie the shared genetic influence of these disorders. We performed an inversion association analysis including 21 inversions and 25 obesity-related traits on a total of 408,898 Europeans and validated the results in 67,299 independent individuals. Seven inversions were associated with multiple diseases while inversions at 8p23.1, 16p11.2, and 11q13.2 were strongly associated with the co-occurrence of obesity with other common diseases. Transcriptome analysis across numerous tissues revealed strong candidate genes for obesity-related traits. Analyses in human pancreatic islets indicated the potential mechanism of inversions in the susceptibility of diabetes by disrupting the cis-regulatory effect of SNPs from their target genes. Our data underscore the role of inversions as major genetic contributors to the joint susceptibility to common complex diseases.


Asunto(s)
Inversión Cromosómica/genética , Diabetes Mellitus/genética , Predisposición Genética a la Enfermedad , Hipertensión/genética , Obesidad/complicaciones , Obesidad/genética , Polimorfismo Genético , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Alelos , Cromosomas Humanos Par 16/genética , Cromosomas Humanos Par 8/genética , Conjuntos de Datos como Asunto/normas , Diabetes Mellitus/patología , Europa (Continente)/etnología , Femenino , Perfilación de la Expresión Génica , Haplotipos , Humanos , Hipertensión/complicaciones , Islotes Pancreáticos/metabolismo , Islotes Pancreáticos/patología , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple/genética , Reproducibilidad de los Resultados , Adulto Joven
8.
Proc Natl Acad Sci U S A ; 117(23): 12592-12594, 2020 06 09.
Artículo en Inglés | MEDLINE | ID: mdl-32457147

RESUMEN

Artificial intelligence (AI) systems for computer-aided diagnosis and image-based screening are being adopted worldwide by medical institutions. In such a context, generating fair and unbiased classifiers becomes of paramount importance. The research community of medical image computing is making great efforts in developing more accurate algorithms to assist medical doctors in the difficult task of disease diagnosis. However, little attention is paid to the way databases are collected and how this may influence the performance of AI systems. Our study sheds light on the importance of gender balance in medical imaging datasets used to train AI systems for computer-assisted diagnosis. We provide empirical evidence supported by a large-scale study, based on three deep neural network architectures and two well-known publicly available X-ray image datasets used to diagnose various thoracic diseases under different gender imbalance conditions. We found a consistent decrease in performance for underrepresented genders when a minimum balance is not fulfilled. This raises the alarm for national agencies in charge of regulating and approving computer-assisted diagnosis systems, which should include explicit gender balance and diversity recommendations. We also establish an open problem for the academic medical image computing community which needs to be addressed by novel algorithms endowed with robustness to gender imbalance.


Asunto(s)
Conjuntos de Datos como Asunto/normas , Aprendizaje Profundo/normas , Interpretación de Imagen Radiográfica Asistida por Computador/normas , Radiografía Torácica/normas , Sesgo , Femenino , Humanos , Masculino , Estándares de Referencia , Factores Sexuales
9.
Ann Surg ; 275(3): e549-e561, 2022 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-34238814

RESUMEN

OBJECTIVE: The aim of this study to describe a new international dataset for pathology reporting of colorectal cancer surgical specimens, produced under the auspices of the International Collaboration on Cancer Reporting (ICCR). BACKGROUND: Quality of pathology reporting and mutual understanding between colorectal surgeon, pathologist and oncologist are vital to patient management. Some pathology parameters are prone to variable interpretation, resulting in differing positions adopted by existing national datasets. METHODS: The ICCR, a global alliance of major pathology institutions with links to international cancer organizations, has developed and ratified a rigorous and efficient process for the development of evidence-based, structured datasets for pathology reporting of common cancers. Here we describe the production of a dataset for colorectal cancer resection specimens by a multidisciplinary panel of internationally recognized experts. RESULTS: The agreed dataset comprises eighteen core (essential) and seven non-core (recommended) elements identified from a review of current evidence. Areas of contention are addressed, some highly relevant to surgical practice, with the aim of standardizing multidisciplinary discussion. The summation of all core elements is considered to be the minimum reporting standard for individual cases. Commentary is provided, explaining each element's clinical relevance, definitions to be applied where appropriate for the agreed list of value options and the rationale for considering the element as core or non-core. CONCLUSIONS: This first internationally agreed dataset for colorectal cancer pathology reporting promotes standardization of pathology reporting and enhanced clinicopathological communication. Widespread adoption will facilitate international comparisons, multinational clinical trials and help to improve the management of colorectal cancer globally.


Asunto(s)
Neoplasias Colorrectales/patología , Conjuntos de Datos como Asunto/normas , Proyectos de Investigación , Humanos
10.
J Am Soc Nephrol ; 32(6): 1279-1292, 2021 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-33722930

RESUMEN

Over the last 5 years, single cell methods have enabled the monitoring of gene and protein expression, genetic, and epigenetic changes in thousands of individual cells in a single experiment. With the improved measurement and the decreasing cost of the reactions and sequencing, the size of these datasets is increasing rapidly. The critical bottleneck remains the analysis of the wealth of information generated by single cell experiments. In this review, we give a simplified overview of the analysis pipelines, as they are typically used in the field today. We aim to enable researchers starting out in single cell analysis to gain an overview of challenges and the most commonly used analytical tools. In addition, we hope to empower others to gain an understanding of how typical readouts from single cell datasets are presented in the published literature.


Asunto(s)
Análisis de Datos , Análisis de Secuencia de ARN , Análisis de la Célula Individual/métodos , Programas Informáticos , Visualización de Datos , Conjuntos de Datos como Asunto/normas , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Genómica , Humanos , Análisis de Componente Principal , Control de Calidad
11.
Alzheimers Dement ; 18(1): 29-42, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-33984176

RESUMEN

INTRODUCTION: Harmonized neuropsychological assessment for neurocognitive disorders, an international priority for valid and reliable diagnostic procedures, has been achieved only in specific countries or research contexts. METHODS: To harmonize the assessment of mild cognitive impairment in Europe, a workshop (Geneva, May 2018) convened stakeholders, methodologists, academic, and non-academic clinicians and experts from European, US, and Australian harmonization initiatives. RESULTS: With formal presentations and thematic working-groups we defined a standard battery consistent with the U.S. Uniform DataSet, version 3, and homogeneous methodology to obtain consistent normative data across tests and languages. Adaptations consist of including two tests specific to typical Alzheimer's disease and behavioral variant frontotemporal dementia. The methodology for harmonized normative data includes consensus definition of cognitively normal controls, classification of confounding factors (age, sex, and education), and calculation of minimum sample sizes. DISCUSSION: This expert consensus allows harmonizing the diagnosis of neurocognitive disorders across European countries and possibly beyond.


Asunto(s)
Disfunción Cognitiva , Conferencias de Consenso como Asunto , Conjuntos de Datos como Asunto/normas , Pruebas Neuropsicológicas/normas , Factores de Edad , Cognición , Disfunción Cognitiva/clasificación , Disfunción Cognitiva/diagnóstico , Escolaridad , Europa (Continente) , Testimonio de Experto , Humanos , Lenguaje , Factores Sexuales
12.
Br J Cancer ; 125(2): 155-163, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-33850304

RESUMEN

The complexity of neoplasia and its treatment are a challenge to the formulation of general criteria that are applicable across solid cancers. Determining the number of prior lines of therapy (LoT) is critically important for optimising future treatment, conducting medication audits, and assessing eligibility for clinical trial enrolment. Currently, however, no accepted set of criteria or definitions exists to enumerate LoT. In this article, we seek to open a dialogue to address this challenge by proposing a systematic and comprehensive framework to determine LoT uniformly across solid malignancies. First, key terms, including LoT and 'clinical progression of disease' are defined. Next, we clarify which therapies should be assigned a LoT, and why. Finally, we propose reporting LoT in a novel and standardised format as LoT N (CLoT + PLoT), where CLoT is the number of systemic anti-cancer therapies (SACT) administered with curative intent and/or in the early setting, PLoT is the number of SACT given with palliative intent and/or in the advanced setting, and N is the sum of CLoT and PLoT. As a next step, the cancer research community should develop and adopt standardised guidelines for enumerating LoT in a uniform manner.


Asunto(s)
Toma de Decisiones Clínicas/métodos , Neoplasias/terapia , Conjuntos de Datos como Asunto/normas , Sistemas de Apoyo a Decisiones Clínicas , Técnica Delphi , Humanos
13.
FASEB J ; 34(5): 6027-6037, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32350928

RESUMEN

There are currently no proven or approved treatments for coronavirus disease 2019 (COVID-19). Early anecdotal reports and limited in vitro data led to the significant uptake of hydroxychloroquine (HCQ), and to lesser extent chloroquine (CQ), for many patients with this disease. As an increasing number of patients with COVID-19 are treated with these agents and more evidence accumulates, there continues to be no high-quality clinical data showing a clear benefit of these agents for this disease. Moreover, these agents have the potential to cause harm, including a broad range of adverse events including serious cardiac side effects when combined with other agents. In addition, the known and potent immunomodulatory effects of these agents which support their use in the treatment of auto-immune conditions, and provided a component in the original rationale for their use in patients with COVID-19, may, in fact, undermine their utility in the context of the treatment of this respiratory viral infection. Specifically, the impact of HCQ on cytokine production and suppression of antigen presentation may have immunologic consequences that hamper innate and adaptive antiviral immune responses for patients with COVID-19. Similarly, the reported in vitro inhibition of viral proliferation is largely derived from the blockade of viral fusion that initiates infection rather than the direct inhibition of viral replication as seen with nucleoside/tide analogs in other viral infections. Given these facts and the growing uncertainty about these agents for the treatment of COVID-19, it is clear that at the very least thoughtful planning and data collection from randomized clinical trials are needed to understand what if any role these agents may have in this disease. In this article, we review the datasets that support or detract from the use of these agents for the treatment of COVID-19 and render a data informed opinion that they should only be used with caution and in the context of carefully thought out clinical trials, or on a case-by-case basis after rigorous consideration of the risks and benefits of this therapeutic approach.


Asunto(s)
Infecciones por Coronavirus/tratamiento farmacológico , Hidroxicloroquina/efectos adversos , Hidroxicloroquina/uso terapéutico , Neumonía Viral/tratamiento farmacológico , COVID-19 , Conjuntos de Datos como Asunto/normas , Corazón/efectos de los fármacos , Humanos , Hidroxicloroquina/farmacología , Inmunidad Innata/efectos de los fármacos , Pandemias , Ensayos Clínicos Controlados Aleatorios como Asunto/normas
14.
Future Oncol ; 17(15): 1865-1877, 2021 May.
Artículo en Inglés | MEDLINE | ID: mdl-33629590

RESUMEN

Retrospective observational research relies on databases that do not routinely record lines of therapy or reasons for treatment change. Standardized approaches to estimate lines of therapy were developed and evaluated in this study. A number of rules were developed, assumptions varied and macros developed to apply to large datasets. Results were investigated in an iterative process to refine line of therapy algorithms in three different cancers (lung, colorectal and gastric). Three primary factors were evaluated and included in the estimation of lines of therapy in oncology: defining a treatment regimen, addition/removal of drugs and gap periods. Algorithms and associated Statistical Analysis Software (SAS®) macros for line of therapy identification are provided to facilitate and standardize the use of real-world databases for oncology research.


Lay abstract Most, if not all, real-world healthcare databases do not contain data explaining treatment changes, requiring that rules be applied to estimate when treatment changes may reflect advancement of underlying disease. This study investigated three tumor types (lung, colorectal and gastric cancer) to develop and provide rules that researchers can apply to real-world databases. The resulting algorithms and associated SAS® macros from this work are provided for use in the Supplementary data.


Asunto(s)
Protocolos de Quimioterapia Combinada Antineoplásica/uso terapéutico , Neoplasias Colorrectales/tratamiento farmacológico , Manejo de Datos/métodos , Neoplasias Pulmonares/tratamiento farmacológico , Oncología Médica/normas , Neoplasias Gástricas/tratamiento farmacológico , Algoritmos , Manejo de Datos/normas , Bases de Datos Factuales/normas , Bases de Datos Factuales/estadística & datos numéricos , Conjuntos de Datos como Asunto/normas , Humanos , Oncología Médica/estadística & datos numéricos , Estudios Observacionales como Asunto/normas , Estudios Observacionales como Asunto/estadística & datos numéricos , Estudios Retrospectivos , Programas Informáticos
16.
Rheumatology (Oxford) ; 59(1): 137-145, 2020 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-31243450

RESUMEN

OBJECTIVES: Data collected during routine clinic visits are key to driving successful quality improvement in clinical services and enabling integration of research into routine care. The purpose of this study was to develop a standardized core dataset for juvenile idiopathic arthritis (JIA) (termed CAPTURE-JIA), enabling routine clinical collection of research-quality patient data useful to all relevant stakeholder groups (clinicians, service-providers, researchers, health service planners and patients/families) and including outcomes of relevance to patients/families. METHODS: Collaborative consensus-based approaches (including Delphi and World Café methodologies) were employed. The study was divided into discrete phases, including collaborative working with other groups developing relevant core datasets and a two-stage Delphi process, with the aim of rationalizing the initially long data item list to a clinically feasible size. RESULTS: The initial stage of the process identified collection of 297 discrete data items by one or more of fifteen NHS paediatric rheumatology centres. Following the two-stage Delphi process, culminating in a consensus workshop (May 2015), the final approved CAPTURE-JIA dataset consists of 62 discrete and defined clinical data items including novel JIA-specific patient-reported outcome and experience measures. CONCLUSIONS: CAPTURE-JIA is the first 'JIA core dataset' to include data items considered essential by key stakeholder groups engaged with leading and improving the clinical care of children and young people with JIA. Collecting essential patient information in a standard way is a major step towards improving the quality and consistency of clinical services, facilitating collaborative and effective working, benchmarking clinical services against quality indicators and aligning treatment strategies and clinical research opportunities.


Asunto(s)
Artritis Juvenil , Conjuntos de Datos como Asunto/normas , Atención a la Salud/normas , Reumatología/normas , Adolescente , Niño , Consenso , Técnica Delphi , Femenino , Humanos , Colaboración Intersectorial , Masculino , Medición de Resultados Informados por el Paciente , Mejoramiento de la Calidad
17.
Genet Sel Evol ; 52(1): 38, 2020 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-32640985

RESUMEN

BACKGROUND: We describe the latest improvements to the long-range phasing (LRP) and haplotype library imputation (HLI) algorithms for successful phasing of both datasets with one million individuals and datasets genotyped using different sets of single nucleotide polymorphisms (SNPs). Previous publicly available implementations of the LRP algorithm implemented in AlphaPhase could not phase large datasets due to the computational cost of defining surrogate parents by exhaustive all-against-all searches. Furthermore, the AlphaPhase implementations of LRP and HLI were not designed to deal with large amounts of missing data that are inherent when using multiple SNP arrays. METHODS: We developed methods that avoid the need for all-against-all searches by performing LRP on subsets of individuals and then concatenating the results. We also extended LRP and HLI algorithms to enable the use of different sets of markers, including missing values, when determining surrogate parents and identifying haplotypes. We implemented and tested these extensions in an updated version of AlphaPhase, and compared its performance to the software package Eagle2. RESULTS: A simulated dataset with one million individuals genotyped with the same 6711 SNPs for a single chromosome took less than a day to phase, compared to more than seven days for Eagle2. The percentage of correctly phased alleles at heterozygous loci was 90.2 and 99.9% for AlphaPhase and Eagle2, respectively. A larger dataset with one million individuals genotyped with 49,579 SNPs for a single chromosome took AlphaPhase 23 days to phase, with 89.9% of alleles at heterozygous loci phased correctly. The phasing accuracy was generally lower for datasets with different sets of markers than with one set of markers. For a simulated dataset with three sets of markers, 1.5% of alleles at heterozygous positions were phased incorrectly, compared to 0.4% with one set of markers. CONCLUSIONS: The improved LRP and HLI algorithms enable AlphaPhase to quickly and accurately phase very large and heterogeneous datasets. AlphaPhase is an order of magnitude faster than the other tested packages, although Eagle2 showed a higher level of phasing accuracy. The speed gain will make phasing achievable for very large genomic datasets in livestock, enabling more powerful breeding and genetics research and application.


Asunto(s)
Algoritmos , Conjuntos de Datos como Asunto/normas , Estudio de Asociación del Genoma Completo/métodos , Haplotipos , Animales , Estudio de Asociación del Genoma Completo/normas , Heterocigoto , Ganado/genética , Polimorfismo de Nucleótido Simple
19.
Public Health Nutr ; 23(11): 1889-1895, 2020 08.
Artículo en Inglés | MEDLINE | ID: mdl-32295655

RESUMEN

OBJECTIVE: Commercially available business (CAB) datasets for food environments have been investigated for error in large urban contexts and some rural areas, but there is a relative dearth of literature that reports error across regions of variable rurality. The objective of the current study was to assess the validity of a CAB dataset using a government dataset at the provincial scale. DESIGN: A ground-truthed dataset provided by the government of Newfoundland and Labrador (NL) was used to assess a popular commercial dataset. Concordance, sensitivity, positive-predictive value (PPV) and geocoding errors were calculated. Measures were stratified by store types and rurality to investigate any association between these variables and database accuracy. SETTING: NL, Canada. PARTICIPANTS: The current analysis used store-level (ecological) data. RESULTS: Of 1125 stores, there were 380 stores that existed in both datasets and were considered true-positive stores. The mean positional error between a ground-truthed and test point was 17·72 km. When compared with the provincial dataset of businesses, grocery stores had the greatest agreement, sensitivity = 0·64, PPV = 0·60 and concordance = 0·45. Gas stations had the least agreement, sensitivity = 0·26, PPV = 0·32 and concordance = 0·17. Only 4 % of commercial data points in rural areas matched every criterion examined. CONCLUSIONS: The commercial dataset exhibits a low level of agreement with the ground-truthed provincial data. Particularly retailers in rural areas or belonging to the gas station category suffered from misclassification and/or geocoding errors. Taken together, the commercial dataset is differentially representative of the ground-truthed reality based on store-type and rurality/urbanity.


Asunto(s)
Comercio/estadística & datos numéricos , Conjuntos de Datos como Asunto/normas , Abastecimiento de Alimentos/estadística & datos numéricos , Población Rural/estadística & datos numéricos , Medio Social , Bases de Datos Factuales , Gobierno , Humanos , Terranova y Labrador , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Población Urbana/estadística & datos numéricos
20.
BMC Palliat Care ; 19(1): 89, 2020 Jun 23.
Artículo en Inglés | MEDLINE | ID: mdl-32576171

RESUMEN

BACKGROUND: There is an increased interest in the analysis of large, national palliative care data sets including patient reported outcomes (PROs). No study has investigated if it was best to include or exclude data from services with low response rates in order to obtain the patient reported outcomes most representative of the national palliative care population. Thus, the aim of this study was to investigate whether services with low response rates should be excluded from analyses to prevent effects of possible selection bias. METHODS: Data from the Danish Palliative Care Database from 24,589 specialized palliative care admittances of cancer patients was included. Patients reported ten aspects of quality of life using the EORTC QLQ-C15-PAL-questionnaire. Multiple linear regression was performed to test if response rate was associated with the ten aspects of quality of life. RESULTS: The score of six quality of life aspects were significantly associated with response rate. However, in only two cases patients from specialized palliative care services with lower response rates (< 20.0%, 20.0-29.9%, 30.0-39.9%, 40.0-49.9% or 50.0-59.9) were feeling better than patients from services with high response rates (≥60%) and in both cases it was less than 2 points on a 0-100 scale. CONCLUSIONS: The study hypothesis, that patients from specialized palliative care services with lower response rates were reporting better quality of life than those from specialized palliative care services with high response rates, was not supported. This suggests that there is no reason to exclude data from specialized palliative care services with low response rates.


Asunto(s)
Exactitud de los Datos , Conjuntos de Datos como Asunto/tendencias , Cuidados Paliativos/estadística & datos numéricos , Medición de Resultados Informados por el Paciente , Sistema de Registros/estadística & datos numéricos , Adulto , Conjuntos de Datos como Asunto/normas , Femenino , Humanos , Masculino , Persona de Mediana Edad , Cuidados Paliativos/métodos , Calidad de la Atención de Salud/normas , Calidad de la Atención de Salud/estadística & datos numéricos , Sujetos de Investigación/estadística & datos numéricos , Encuestas y Cuestionarios
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA