Pesquisa | BVS Doenças Infecciosas e Parasitárias

1.

Highly accurate protein structure prediction for the human proteome.

Tunyasuvunakool, Kathryn; Adler, Jonas; Wu, Zachary; Green, Tim; Zielinski, Michal; Zídek, Augustin; Bridgland, Alex; Cowie, Andrew; Meyer, Clemens; Laydon, Agata; Velankar, Sameer; Kleywegt, Gerard J; Bateman, Alex; Evans, Richard; Pritzel, Alexander; Figurnov, Michael; Ronneberger, Olaf; Bates, Russ; Kohl, Simon A A; Potapenko, Anna; Ballard, Andrew J; Romera-Paredes, Bernardino; Nikolov, Stanislav; Jain, Rishub; Clancy, Ellen; Reiman, David; Petersen, Stig; Senior, Andrew W; Kavukcuoglu, Koray; Birney, Ewan; Kohli, Pushmeet; Jumper, John; Hassabis, Demis.

Nature ; 596(7873): 590-596, 2021 08.

Artigo em Inglês | MEDLINE | ID: mdl-34293799

RESUMO

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure1. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold2, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.

Assuntos

Biologia Computacional/normas , Aprendizado Profundo/normas , Modelos Moleculares , Conformação Proteica , Proteoma/química , Conjuntos de Dados como Assunto/normas , Diacilglicerol O-Aciltransferase/química , Glucose-6-Fosfatase/química , Humanos , Proteínas de Membrana/química , Dobramento de Proteína , Reprodutibilidade dos Testes

2.

Unrepresentative big surveys significantly overestimated US vaccine uptake.

Bradley, Valerie C; Kuriwaki, Shiro; Isakov, Michael; Sejdinovic, Dino; Meng, Xiao-Li; Flaxman, Seth.

Nature ; 600(7890): 695-700, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34880504

RESUMO

Surveys are a crucial tool for understanding public opinion and behaviour, and their accuracy depends on maintaining statistical representativeness of their target populations by minimizing biases from all sources. Increasing data size shrinks confidence intervals but magnifies the effect of survey bias: an instance of the Big Data Paradox1. Here we demonstrate this paradox in estimates of first-dose COVID-19 vaccine uptake in US adults from 9 January to 19 May 2021 from two large surveys: Delphi-Facebook2,3 (about 250,000 responses per week) and Census Household Pulse4 (about 75,000 every two weeks). In May 2021, Delphi-Facebook overestimated uptake by 17 percentage points (14-20 percentage points with 5% benchmark imprecision) and Census Household Pulse by 14 (11-17 percentage points with 5% benchmark imprecision), compared to a retroactively updated benchmark the Centers for Disease Control and Prevention published on 26 May 2021. Moreover, their large sample sizes led to miniscule margins of error on the incorrect estimates. By contrast, an Axios-Ipsos online panel5 with about 1,000 responses per week following survey research best practices6 provided reliable estimates and uncertainty quantification. We decompose observed error using a recent analytic framework1 to explain the inaccuracy in the three surveys. We then analyse the implications for vaccine hesitancy and willingness. We show how a survey of 250,000 respondents can produce an estimate of the population mean that is no more accurate than an estimate from a simple random sample of size 10. Our central message is that data quality matters more than data quantity, and that compensating the former with the latter is a mathematically provable losing proposition.

Assuntos

Vacinas contra COVID-19/administração & dosagem , Pesquisas sobre Atenção à Saúde , Vacinação/estatística & dados numéricos , Benchmarking , Viés , Big Data , COVID-19/epidemiologia , COVID-19/prevenção & controle , Centers for Disease Control and Prevention, U.S. , Conjuntos de Dados como Assunto/normas , Feminino , Pesquisas sobre Atenção à Saúde/normas , Humanos , Masculino , Projetos de Pesquisa , Tamanho da Amostra , Mídias Sociais , Estados Unidos/epidemiologia , Hesitação Vacinal/estatística & dados numéricos

3.

Correcting datasets leads to more homogeneous early-twentieth-century sea surface warming.

Chan, Duo; Kent, Elizabeth C; Berry, David I; Huybers, Peter.

Nature ; 571(7765): 393-397, 2019 07.

Artigo em Inglês | MEDLINE | ID: mdl-31316195

RESUMO

Existing estimates of sea surface temperatures (SSTs) indicate that, during the early twentieth century, the North Atlantic and northeast Pacific oceans warmed by twice the global average, whereas the northwest Pacific Ocean cooled by an amount equal to the global average1-4. Such a heterogeneous pattern suggests first-order contributions from regional variations in forcing or in ocean-atmosphere heat fluxes5,6. These older SST estimates are, however, derived from measurements of water temperatures in ship-board buckets, and must be corrected for substantial biases7-9. Here we show that correcting for offsets among groups of bucket measurements leads to SST variations that correlate better with nearby land temperatures and are more homogeneous in their pattern of warming. Offsets are identified by systematically comparing nearby SST observations among different groups10. Correcting for offsets in German measurements decreases warming rates in the North Atlantic, whereas correcting for Japanese measurement offsets leads to increased and more uniform warming in the North Pacific. Japanese measurement offsets in the 1930s primarily result from records having been truncated to whole degrees Celsius when the records were digitized in the 1960s. These findings underscore the fact that historical SST records reflect both physical and social dimensions in data collection, and suggest that further opportunities exist for improving the accuracy of historical SST records9,11.

Assuntos

Conjuntos de Dados como Assunto/normas , Aquecimento Global/estatística & dados numéricos , Água do Mar/análise , Temperatura , Ar/análise , Oceano Atlântico , Conjuntos de Dados como Assunto/história , Mapeamento Geográfico , Alemanha , Aquecimento Global/história , História do Século XX , Japão , Oceano Pacífico , Reprodutibilidade dos Testes

4.

Adaptive Immune Receptor Repertoire Community recommendations for sharing immune-repertoire sequencing data.

Rubelt, Florian; Busse, Christian E; Bukhari, Syed Ahmad Chan; Bürckert, Jean-Philippe; Mariotti-Ferrandiz, Encarnita; Cowell, Lindsay G; Watson, Corey T; Marthandan, Nishanth; Faison, William J; Hershberg, Uri; Laserson, Uri; Corrie, Brian D; Davis, Mark M; Peters, Bjoern; Lefranc, Marie-Paule; Scott, Jamie K; Breden, Felix; Luning Prak, Eline T; Kleinstein, Steven H.

Nat Immunol ; 18(12): 1274-1278, 2017 11 16.

Artigo em Inglês | MEDLINE | ID: mdl-29144493

Assuntos

Conjuntos de Dados como Assunto/normas , Disseminação de Informação , Receptores Imunológicos/genética , Projetos de Pesquisa/normas , Animais , Humanos

5.

Accurate and Scalable Construction of Polygenic Scores in Large Biobank Data Sets.

Yang, Sheng; Zhou, Xiang.

Am J Hum Genet ; 106(5): 679-693, 2020 05 07.

Artigo em Inglês | MEDLINE | ID: mdl-32330416

RESUMO

Accurate construction of polygenic scores (PGS) can enable early diagnosis of diseases and facilitate the development of personalized medicine. Accurate PGS construction requires prediction models that are both adaptive to different genetic architectures and scalable to biobank scale datasets with millions of individuals and tens of millions of genetic variants. Here, we develop such a method called Deterministic Bayesian Sparse Linear Mixed Model (DBSLMM). DBSLMM relies on a flexible modeling assumption on the effect size distribution to achieve robust and accurate prediction performance across a range of genetic architectures. DBSLMM also relies on a simple deterministic search algorithm to yield an approximate analytic estimation solution using summary statistics only. The deterministic search algorithm, when paired with further algebraic innovations, results in substantial computational savings. With simulations, we show that DBSLMM achieves scalable and accurate prediction performance across a range of realistic genetic architectures. We then apply DBSLMM to analyze 25 traits in UK Biobank. For these traits, compared to existing approaches, DBSLMM achieves an average of 2.03%-101.09% accuracy gain in internal cross-validations. In external validations on two separate datasets, including one from BioBank Japan, DBSLMM achieves an average of 14.74%-522.74% accuracy gain. In these real data applications, DBSLMM is 1.03-28.11 times faster and uses only 7.4%-24.8% of physical memory as compared to other multiple regression-based PGS methods. Overall, DBSLMM represents an accurate and scalable method for constructing PGS in biobank scale datasets.

Assuntos

Bases de Dados Factuais/normas , Conjuntos de Dados como Assunto/normas , Herança Multifatorial , Teorema de Bayes , Feminino , Humanos , Modelos Lineares , Masculino , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Tamanho da Amostra , Reino Unido , População Branca/genética

6.

Polymorphic Inversions Underlie the Shared Genetic Susceptibility of Obesity-Related Diseases.

González, Juan R; Ruiz-Arenas, Carlos; Cáceres, Alejandro; Morán, Ignasi; López-Sánchez, Marcos; Alonso, Lorena; Tolosana, Ignacio; Guindo-Martínez, Marta; Mercader, Josep M; Esko, Tonu; Torrents, David; González, Josefa; Pérez-Jurado, Luis A.

Am J Hum Genet ; 106(6): 846-858, 2020 06 04.

Artigo em Inglês | MEDLINE | ID: mdl-32470372

RESUMO

The burden of several common diseases including obesity, diabetes, hypertension, asthma, and depression is increasing in most world populations. However, the mechanisms underlying the numerous epidemiological and genetic correlations among these disorders remain largely unknown. We investigated whether common polymorphic inversions underlie the shared genetic influence of these disorders. We performed an inversion association analysis including 21 inversions and 25 obesity-related traits on a total of 408,898 Europeans and validated the results in 67,299 independent individuals. Seven inversions were associated with multiple diseases while inversions at 8p23.1, 16p11.2, and 11q13.2 were strongly associated with the co-occurrence of obesity with other common diseases. Transcriptome analysis across numerous tissues revealed strong candidate genes for obesity-related traits. Analyses in human pancreatic islets indicated the potential mechanism of inversions in the susceptibility of diabetes by disrupting the cis-regulatory effect of SNPs from their target genes. Our data underscore the role of inversions as major genetic contributors to the joint susceptibility to common complex diseases.

Assuntos

Inversão Cromossômica/genética , Diabetes Mellitus/genética , Predisposição Genética para Doença , Hipertensão/genética , Obesidade/complicações , Obesidade/genética , Polimorfismo Genético , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Alelos , Cromossomos Humanos Par 16/genética , Cromossomos Humanos Par 8/genética , Conjuntos de Dados como Assunto/normas , Diabetes Mellitus/patologia , Europa (Continente)/etnologia , Feminino , Perfilação da Expressão Gênica , Haplótipos , Humanos , Hipertensão/complicações , Ilhotas Pancreáticas/metabolismo , Ilhotas Pancreáticas/patologia , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único/genética , Reprodutibilidade dos Testes , Adulto Jovem

7.

Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis.

Larrazabal, Agostina J; Nieto, Nicolás; Peterson, Victoria; Milone, Diego H; Ferrante, Enzo.

Proc Natl Acad Sci U S A ; 117(23): 12592-12594, 2020 06 09.

Artigo em Inglês | MEDLINE | ID: mdl-32457147

RESUMO

Artificial intelligence (AI) systems for computer-aided diagnosis and image-based screening are being adopted worldwide by medical institutions. In such a context, generating fair and unbiased classifiers becomes of paramount importance. The research community of medical image computing is making great efforts in developing more accurate algorithms to assist medical doctors in the difficult task of disease diagnosis. However, little attention is paid to the way databases are collected and how this may influence the performance of AI systems. Our study sheds light on the importance of gender balance in medical imaging datasets used to train AI systems for computer-assisted diagnosis. We provide empirical evidence supported by a large-scale study, based on three deep neural network architectures and two well-known publicly available X-ray image datasets used to diagnose various thoracic diseases under different gender imbalance conditions. We found a consistent decrease in performance for underrepresented genders when a minimum balance is not fulfilled. This raises the alarm for national agencies in charge of regulating and approving computer-assisted diagnosis systems, which should include explicit gender balance and diversity recommendations. We also establish an open problem for the academic medical image computing community which needs to be addressed by novel algorithms endowed with robustness to gender imbalance.

Assuntos

Conjuntos de Dados como Assunto/normas , Aprendizado Profundo/normas , Interpretação de Imagem Radiográfica Assistida por Computador/normas , Radiografia Torácica/normas , Viés , Feminino , Humanos , Masculino , Padrões de Referência , Fatores Sexuais

8.

Dataset for Pathology Reporting of Colorectal Cancer: Recommendations From the International Collaboration on Cancer Reporting (ICCR).

Loughrey, Maurice B; Webster, Fleur; Arends, Mark J; Brown, Ian; Burgart, Lawrence J; Cunningham, Chris; Flejou, Jean-Francois; Kakar, Sanjay; Kirsch, Richard; Kojima, Motohiro; Lugli, Alessandro; Rosty, Christophe; Sheahan, Kieran; West, Nicholas P; Wilson, Richard H; Nagtegaal, Iris D.

Ann Surg ; 275(3): e549-e561, 2022 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-34238814

RESUMO

OBJECTIVE: The aim of this study to describe a new international dataset for pathology reporting of colorectal cancer surgical specimens, produced under the auspices of the International Collaboration on Cancer Reporting (ICCR). BACKGROUND: Quality of pathology reporting and mutual understanding between colorectal surgeon, pathologist and oncologist are vital to patient management. Some pathology parameters are prone to variable interpretation, resulting in differing positions adopted by existing national datasets. METHODS: The ICCR, a global alliance of major pathology institutions with links to international cancer organizations, has developed and ratified a rigorous and efficient process for the development of evidence-based, structured datasets for pathology reporting of common cancers. Here we describe the production of a dataset for colorectal cancer resection specimens by a multidisciplinary panel of internationally recognized experts. RESULTS: The agreed dataset comprises eighteen core (essential) and seven non-core (recommended) elements identified from a review of current evidence. Areas of contention are addressed, some highly relevant to surgical practice, with the aim of standardizing multidisciplinary discussion. The summation of all core elements is considered to be the minimum reporting standard for individual cases. Commentary is provided, explaining each element's clinical relevance, definitions to be applied where appropriate for the agreed list of value options and the rationale for considering the element as core or non-core. CONCLUSIONS: This first internationally agreed dataset for colorectal cancer pathology reporting promotes standardization of pathology reporting and enhanced clinicopathological communication. Widespread adoption will facilitate international comparisons, multinational clinical trials and help to improve the management of colorectal cancer globally.

Assuntos

Neoplasias Colorretais/patologia , Conjuntos de Dados como Assunto/normas , Projetos de Pesquisa , Humanos

9.

How to Get Started with Single Cell RNA Sequencing Data Analysis.

Balzer, Michael S; Ma, Ziyuan; Zhou, Jianfu; Abedini, Amin; Susztak, Katalin.

J Am Soc Nephrol ; 32(6): 1279-1292, 2021 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-33722930

RESUMO

Over the last 5 years, single cell methods have enabled the monitoring of gene and protein expression, genetic, and epigenetic changes in thousands of individual cells in a single experiment. With the improved measurement and the decreasing cost of the reactions and sequencing, the size of these datasets is increasing rapidly. The critical bottleneck remains the analysis of the wealth of information generated by single cell experiments. In this review, we give a simplified overview of the analysis pipelines, as they are typically used in the field today. We aim to enable researchers starting out in single cell analysis to gain an overview of challenges and the most commonly used analytical tools. In addition, we hope to empower others to gain an understanding of how typical readouts from single cell datasets are presented in the published literature.

Assuntos

Análise de Dados , Análise de Sequência de RNA , Análise de Célula Única/métodos , Software , Visualização de Dados , Conjuntos de Dados como Assunto/normas , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Genômica , Humanos , Análise de Componente Principal , Controle de Qualidade

10.

Harmonizing neuropsychological assessment for mild neurocognitive disorders in Europe.

Boccardi, Marina; Monsch, Andreas U; Ferrari, Clarissa; Altomare, Daniele; Berres, Manfred; Bos, Isabelle; Buchmann, Andreas; Cerami, Chiara; Didic, Mira; Festari, Cristina; Nicolosi, Valentina; Sacco, Leonardo; Aerts, Liesbeth; Albanese, Emiliano; Annoni, Jean-Marie; Ballhausen, Nicola; Chicherio, Christian; Démonet, Jean-François; Descloux, Virginie; Diener, Suzie; Ferreira, Daniel; Georges, Jean; Gietl, Anton; Girtler, Nicola; Kilimann, Ingo; Klöppel, Stefan; Kustyniuk, Nicole; Mecocci, Patrizia; Mella, Nathalie; Pigliautile, Martina; Seeher, Katrin; Shirk, Steven D; Toraldo, Alessio; Brioschi-Guevara, Andrea; Chan, Kwun C G; Crane, Paul K; Dodich, Alessandra; Grazia, Alice; Kochan, Nicole A; de Oliveira, Fabricio Ferreira; Nobili, Flavio; Kukull, Walter; Peters, Oliver; Ramakers, Inez; Sachdev, Perminder S; Teipel, Stefan; Visser, Pieter Jelle; Wagner, Michael; Weintraub, Sandra; Westman, Eric.

Alzheimers Dement ; 18(1): 29-42, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-33984176

RESUMO

INTRODUCTION: Harmonized neuropsychological assessment for neurocognitive disorders, an international priority for valid and reliable diagnostic procedures, has been achieved only in specific countries or research contexts. METHODS: To harmonize the assessment of mild cognitive impairment in Europe, a workshop (Geneva, May 2018) convened stakeholders, methodologists, academic, and non-academic clinicians and experts from European, US, and Australian harmonization initiatives. RESULTS: With formal presentations and thematic working-groups we defined a standard battery consistent with the U.S. Uniform DataSet, version 3, and homogeneous methodology to obtain consistent normative data across tests and languages. Adaptations consist of including two tests specific to typical Alzheimer's disease and behavioral variant frontotemporal dementia. The methodology for harmonized normative data includes consensus definition of cognitively normal controls, classification of confounding factors (age, sex, and education), and calculation of minimum sample sizes. DISCUSSION: This expert consensus allows harmonizing the diagnosis of neurocognitive disorders across European countries and possibly beyond.

Assuntos

Disfunção Cognitiva , Conferências de Consenso como Assunto , Conjuntos de Dados como Assunto/normas , Testes Neuropsicológicos/normas , Fatores Etários , Cognição , Disfunção Cognitiva/classificação , Disfunção Cognitiva/diagnóstico , Escolaridade , Europa (Continente) , Prova Pericial , Humanos , Idioma , Fatores Sexuais

11.

Determining lines of therapy in patients with solid cancers: a proposed new systematic and comprehensive framework.

Saini, Kamal S; Twelves, Chris.

Br J Cancer ; 125(2): 155-163, 2021 07.

Artigo em Inglês | MEDLINE | ID: mdl-33850304

RESUMO

The complexity of neoplasia and its treatment are a challenge to the formulation of general criteria that are applicable across solid cancers. Determining the number of prior lines of therapy (LoT) is critically important for optimising future treatment, conducting medication audits, and assessing eligibility for clinical trial enrolment. Currently, however, no accepted set of criteria or definitions exists to enumerate LoT. In this article, we seek to open a dialogue to address this challenge by proposing a systematic and comprehensive framework to determine LoT uniformly across solid malignancies. First, key terms, including LoT and 'clinical progression of disease' are defined. Next, we clarify which therapies should be assigned a LoT, and why. Finally, we propose reporting LoT in a novel and standardised format as LoT N (CLoT + PLoT), where CLoT is the number of systemic anti-cancer therapies (SACT) administered with curative intent and/or in the early setting, PLoT is the number of SACT given with palliative intent and/or in the advanced setting, and N is the sum of CLoT and PLoT. As a next step, the cancer research community should develop and adopt standardised guidelines for enumerating LoT in a uniform manner.

Assuntos

Tomada de Decisão Clínica/métodos , Neoplasias/terapia , Conjuntos de Dados como Assunto/normas , Sistemas de Apoio a Decisões Clínicas , Técnica Delphi , Humanos

12.

Rethinking the role of hydroxychloroquine in the treatment of COVID-19.

Meyerowitz, Eric A; Vannier, Augustin G L; Friesen, Morgan G N; Schoenfeld, Sara; Gelfand, Jeffrey A; Callahan, Michael V; Kim, Arthur Y; Reeves, Patrick M; Poznansky, Mark C.

FASEB J ; 34(5): 6027-6037, 2020 05.

Artigo em Inglês | MEDLINE | ID: mdl-32350928

RESUMO

There are currently no proven or approved treatments for coronavirus disease 2019 (COVID-19). Early anecdotal reports and limited in vitro data led to the significant uptake of hydroxychloroquine (HCQ), and to lesser extent chloroquine (CQ), for many patients with this disease. As an increasing number of patients with COVID-19 are treated with these agents and more evidence accumulates, there continues to be no high-quality clinical data showing a clear benefit of these agents for this disease. Moreover, these agents have the potential to cause harm, including a broad range of adverse events including serious cardiac side effects when combined with other agents. In addition, the known and potent immunomodulatory effects of these agents which support their use in the treatment of auto-immune conditions, and provided a component in the original rationale for their use in patients with COVID-19, may, in fact, undermine their utility in the context of the treatment of this respiratory viral infection. Specifically, the impact of HCQ on cytokine production and suppression of antigen presentation may have immunologic consequences that hamper innate and adaptive antiviral immune responses for patients with COVID-19. Similarly, the reported in vitro inhibition of viral proliferation is largely derived from the blockade of viral fusion that initiates infection rather than the direct inhibition of viral replication as seen with nucleoside/tide analogs in other viral infections. Given these facts and the growing uncertainty about these agents for the treatment of COVID-19, it is clear that at the very least thoughtful planning and data collection from randomized clinical trials are needed to understand what if any role these agents may have in this disease. In this article, we review the datasets that support or detract from the use of these agents for the treatment of COVID-19 and render a data informed opinion that they should only be used with caution and in the context of carefully thought out clinical trials, or on a case-by-case basis after rigorous consideration of the risks and benefits of this therapeutic approach.

Assuntos

Infecções por Coronavirus/tratamento farmacológico , Hidroxicloroquina/efeitos adversos , Hidroxicloroquina/uso terapêutico , Pneumonia Viral/tratamento farmacológico , COVID-19 , Conjuntos de Dados como Assunto/normas , Coração/efeitos dos fármacos , Humanos , Hidroxicloroquina/farmacologia , Imunidade Inata/efeitos dos fármacos , Pandemias , Ensaios Clínicos Controlados Aleatórios como Assunto/normas

13.

Defining treatment regimens and lines of therapy using real-world data in oncology.

Hess, Lisa M; Li, Xiaohong; Wu, Yixun; Goodloe, Robert J; Cui, Zhanglin Lin.

Future Oncol ; 17(15): 1865-1877, 2021 May.

Artigo em Inglês | MEDLINE | ID: mdl-33629590

RESUMO

Retrospective observational research relies on databases that do not routinely record lines of therapy or reasons for treatment change. Standardized approaches to estimate lines of therapy were developed and evaluated in this study. A number of rules were developed, assumptions varied and macros developed to apply to large datasets. Results were investigated in an iterative process to refine line of therapy algorithms in three different cancers (lung, colorectal and gastric). Three primary factors were evaluated and included in the estimation of lines of therapy in oncology: defining a treatment regimen, addition/removal of drugs and gap periods. Algorithms and associated Statistical Analysis Software (SAS®) macros for line of therapy identification are provided to facilitate and standardize the use of real-world databases for oncology research.

Lay abstract Most, if not all, real-world healthcare databases do not contain data explaining treatment changes, requiring that rules be applied to estimate when treatment changes may reflect advancement of underlying disease. This study investigated three tumor types (lung, colorectal and gastric cancer) to develop and provide rules that researchers can apply to real-world databases. The resulting algorithms and associated SAS^® macros from this work are provided for use in the Supplementary data.

Assuntos

Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Neoplasias Colorretais/tratamento farmacológico , Gerenciamento de Dados/métodos , Neoplasias Pulmonares/tratamento farmacológico , Oncologia/normas , Neoplasias Gástricas/tratamento farmacológico , Algoritmos , Gerenciamento de Dados/normas , Bases de Dados Factuais/normas , Bases de Dados Factuais/estatística & dados numéricos , Conjuntos de Dados como Assunto/normas , Humanos , Oncologia/estatística & dados numéricos , Estudos Observacionais como Assunto/normas , Estudos Observacionais como Assunto/estatística & dados numéricos , Estudos Retrospectivos , Software

14.

Invest 5% of research funds in ensuring data are reusable.

Mons, Barend.

Nature ; 578(7796): 491, 2020 02.

Artigo em Inglês | MEDLINE | ID: mdl-32099131

Assuntos

Conjuntos de Dados como Assunto/economia , Conjuntos de Dados como Assunto/provisão & distribuição , Disseminação de Informação/métodos , Investimentos em Saúde/economia , Projetos de Pesquisa/normas , Apoio à Pesquisa como Assunto/economia , Pesquisa/economia , Inteligência Artificial/economia , Inteligência Artificial/tendências , Computação em Nuvem , Análise de Dados , Conjuntos de Dados como Assunto/normas , União Europeia/economia , Guias como Assunto , Disseminação de Informação/ética , Países Baixos , Projetos de Pesquisa/tendências , Pesquisadores

15.

CAPTURE-JIA: a consensus-derived core dataset to improve clinical care for children and young people with juvenile idiopathic arthritis.

McErlane, Flora; Armitt, Gillian; Cobb, Joanna; Bailey, Kathryn; Cleary, Gavin; Douglas, Sharon; Lunt, Laura; Rashid, Amir; Sampath, Sunil; Shoop-Worrall, Stephanie; Smith, Nicola; Foster, Helen; Thomson, Wendy.

Rheumatology (Oxford) ; 59(1): 137-145, 2020 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-31243450

RESUMO

OBJECTIVES: Data collected during routine clinic visits are key to driving successful quality improvement in clinical services and enabling integration of research into routine care. The purpose of this study was to develop a standardized core dataset for juvenile idiopathic arthritis (JIA) (termed CAPTURE-JIA), enabling routine clinical collection of research-quality patient data useful to all relevant stakeholder groups (clinicians, service-providers, researchers, health service planners and patients/families) and including outcomes of relevance to patients/families. METHODS: Collaborative consensus-based approaches (including Delphi and World Café methodologies) were employed. The study was divided into discrete phases, including collaborative working with other groups developing relevant core datasets and a two-stage Delphi process, with the aim of rationalizing the initially long data item list to a clinically feasible size. RESULTS: The initial stage of the process identified collection of 297 discrete data items by one or more of fifteen NHS paediatric rheumatology centres. Following the two-stage Delphi process, culminating in a consensus workshop (May 2015), the final approved CAPTURE-JIA dataset consists of 62 discrete and defined clinical data items including novel JIA-specific patient-reported outcome and experience measures. CONCLUSIONS: CAPTURE-JIA is the first 'JIA core dataset' to include data items considered essential by key stakeholder groups engaged with leading and improving the clinical care of children and young people with JIA. Collecting essential patient information in a standard way is a major step towards improving the quality and consistency of clinical services, facilitating collaborative and effective working, benchmarking clinical services against quality indicators and aligning treatment strategies and clinical research opportunities.

Assuntos

Artrite Juvenil , Conjuntos de Dados como Assunto/normas , Atenção à Saúde/normas , Reumatologia/normas , Adolescente , Criança , Consenso , Técnica Delphi , Feminino , Humanos , Colaboração Intersetorial , Masculino , Medidas de Resultados Relatados pelo Paciente , Melhoria de Qualidade

16.

Extending long-range phasing and haplotype library imputation algorithms to large and heterogeneous datasets.

Money, Daniel; Wilson, David; Jenko, Janez; Whalen, Andrew; Thorn, Steve; Gorjanc, Gregor; Hickey, John M.

Genet Sel Evol ; 52(1): 38, 2020 Jul 08.

Artigo em Inglês | MEDLINE | ID: mdl-32640985

RESUMO

BACKGROUND: We describe the latest improvements to the long-range phasing (LRP) and haplotype library imputation (HLI) algorithms for successful phasing of both datasets with one million individuals and datasets genotyped using different sets of single nucleotide polymorphisms (SNPs). Previous publicly available implementations of the LRP algorithm implemented in AlphaPhase could not phase large datasets due to the computational cost of defining surrogate parents by exhaustive all-against-all searches. Furthermore, the AlphaPhase implementations of LRP and HLI were not designed to deal with large amounts of missing data that are inherent when using multiple SNP arrays. METHODS: We developed methods that avoid the need for all-against-all searches by performing LRP on subsets of individuals and then concatenating the results. We also extended LRP and HLI algorithms to enable the use of different sets of markers, including missing values, when determining surrogate parents and identifying haplotypes. We implemented and tested these extensions in an updated version of AlphaPhase, and compared its performance to the software package Eagle2. RESULTS: A simulated dataset with one million individuals genotyped with the same 6711 SNPs for a single chromosome took less than a day to phase, compared to more than seven days for Eagle2. The percentage of correctly phased alleles at heterozygous loci was 90.2 and 99.9% for AlphaPhase and Eagle2, respectively. A larger dataset with one million individuals genotyped with 49,579 SNPs for a single chromosome took AlphaPhase 23 days to phase, with 89.9% of alleles at heterozygous loci phased correctly. The phasing accuracy was generally lower for datasets with different sets of markers than with one set of markers. For a simulated dataset with three sets of markers, 1.5% of alleles at heterozygous positions were phased incorrectly, compared to 0.4% with one set of markers. CONCLUSIONS: The improved LRP and HLI algorithms enable AlphaPhase to quickly and accurately phase very large and heterogeneous datasets. AlphaPhase is an order of magnitude faster than the other tested packages, although Eagle2 showed a higher level of phasing accuracy. The speed gain will make phasing achievable for very large genomic datasets in livestock, enabling more powerful breeding and genetics research and application.

Assuntos

Algoritmos , Conjuntos de Dados como Assunto/normas , Estudo de Associação Genômica Ampla/métodos , Haplótipos , Animais , Estudo de Associação Genômica Ampla/normas , Heterozigoto , Gado/genética , Polimorfismo de Nucleotídeo Único

17.

Make scientific data FAIR.

Stall, Shelley; Yarmey, Lynn; Cutcher-Gershenfeld, Joel; Hanson, Brooks; Lehnert, Kerstin; Nosek, Brian; Parsons, Mark; Robinson, Erin; Wyborn, Lesley.

Nature ; 570(7759): 27-29, 2019 06.

Artigo em Inglês | MEDLINE | ID: mdl-31164768

Assuntos

Conjuntos de Dados como Assunto/normas , Ciências da Terra/estatística & dados numéricos , Disseminação de Informação/métodos , Armazenamento e Recuperação da Informação/métodos , Armazenamento e Recuperação da Informação/normas , Publicação de Acesso Aberto/normas , Reciclagem/normas , Conjuntos de Dados como Assunto/economia , Conjuntos de Dados como Assunto/tendências , Ciências da Terra/economia , Armazenamento e Recuperação da Informação/economia , Armazenamento e Recuperação da Informação/tendências , Metadados/normas , Metadados/estatística & dados numéricos , Meteorologia/economia , Meteorologia/estatística & dados numéricos , Publicação de Acesso Aberto/economia , Publicação de Acesso Aberto/tendências , Reciclagem/economia , Reciclagem/métodos , Reciclagem/tendências , Reprodutibilidade dos Testes

18.

Validation of a province-wide commercial food store dataset in a heterogeneous predominantly rural food environment.

Taylor, Nathan Ga; Stymest, Jillian; Mah, Catherine L.

Public Health Nutr ; 23(11): 1889-1895, 2020 08.

Artigo em Inglês | MEDLINE | ID: mdl-32295655

RESUMO

OBJECTIVE: Commercially available business (CAB) datasets for food environments have been investigated for error in large urban contexts and some rural areas, but there is a relative dearth of literature that reports error across regions of variable rurality. The objective of the current study was to assess the validity of a CAB dataset using a government dataset at the provincial scale. DESIGN: A ground-truthed dataset provided by the government of Newfoundland and Labrador (NL) was used to assess a popular commercial dataset. Concordance, sensitivity, positive-predictive value (PPV) and geocoding errors were calculated. Measures were stratified by store types and rurality to investigate any association between these variables and database accuracy. SETTING: NL, Canada. PARTICIPANTS: The current analysis used store-level (ecological) data. RESULTS: Of 1125 stores, there were 380 stores that existed in both datasets and were considered true-positive stores. The mean positional error between a ground-truthed and test point was 17·72 km. When compared with the provincial dataset of businesses, grocery stores had the greatest agreement, sensitivity = 0·64, PPV = 0·60 and concordance = 0·45. Gas stations had the least agreement, sensitivity = 0·26, PPV = 0·32 and concordance = 0·17. Only 4 % of commercial data points in rural areas matched every criterion examined. CONCLUSIONS: The commercial dataset exhibits a low level of agreement with the ground-truthed provincial data. Particularly retailers in rural areas or belonging to the gas station category suffered from misclassification and/or geocoding errors. Taken together, the commercial dataset is differentially representative of the ground-truthed reality based on store-type and rurality/urbanity.

Assuntos

Comércio/estatística & dados numéricos , Conjuntos de Dados como Assunto/normas , Abastecimento de Alimentos/estatística & dados numéricos , População Rural/estatística & dados numéricos , Meio Social , Bases de Dados Factuais , Governo , Humanos , Terra Nova e Labrador , Valor Preditivo dos Testes , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , População Urbana/estatística & dados numéricos

19.

Should analyses of large, national palliative care data sets with patient reported outcomes (PROs) be restricted to services with high patient participation? A register-based study.

Hansen, Maiken Bang; Petersen, Morten Aagaard; Ross, Lone; Groenvold, Mogens.

BMC Palliat Care ; 19(1): 89, 2020 Jun 23.

Artigo em Inglês | MEDLINE | ID: mdl-32576171

RESUMO

BACKGROUND: There is an increased interest in the analysis of large, national palliative care data sets including patient reported outcomes (PROs). No study has investigated if it was best to include or exclude data from services with low response rates in order to obtain the patient reported outcomes most representative of the national palliative care population. Thus, the aim of this study was to investigate whether services with low response rates should be excluded from analyses to prevent effects of possible selection bias. METHODS: Data from the Danish Palliative Care Database from 24,589 specialized palliative care admittances of cancer patients was included. Patients reported ten aspects of quality of life using the EORTC QLQ-C15-PAL-questionnaire. Multiple linear regression was performed to test if response rate was associated with the ten aspects of quality of life. RESULTS: The score of six quality of life aspects were significantly associated with response rate. However, in only two cases patients from specialized palliative care services with lower response rates (< 20.0%, 20.0-29.9%, 30.0-39.9%, 40.0-49.9% or 50.0-59.9) were feeling better than patients from services with high response rates (≥60%) and in both cases it was less than 2 points on a 0-100 scale. CONCLUSIONS: The study hypothesis, that patients from specialized palliative care services with lower response rates were reporting better quality of life than those from specialized palliative care services with high response rates, was not supported. This suggests that there is no reason to exclude data from specialized palliative care services with low response rates.

Assuntos

Confiabilidade dos Dados , Conjuntos de Dados como Assunto/tendências , Cuidados Paliativos/estatística & dados numéricos , Medidas de Resultados Relatados pelo Paciente , Sistema de Registros/estatística & dados numéricos , Adulto , Conjuntos de Dados como Assunto/normas , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Cuidados Paliativos/métodos , Qualidade da Assistência à Saúde/normas , Qualidade da Assistência à Saúde/estatística & dados numéricos , Sujeitos da Pesquisa/estatística & dados numéricos , Inquéritos e Questionários

20.

Prediction of Protein-Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets.

Xie, Zengyan; Deng, Xiaoya; Shu, Kunxian.

Int J Mol Sci ; 21(2)2020 Jan 11.

Artigo em Inglês | MEDLINE | ID: mdl-31940793

RESUMO

Protein-protein interaction (PPI) sites play a key role in the formation of protein complexes, which is the basis of a variety of biological processes. Experimental methods to solve PPI sites are expensive and time-consuming, which has led to the development of different kinds of prediction algorithms. We propose a convolutional neural network for PPI site prediction and use residue binding propensity to improve the positive samples. Our method obtains a remarkable result of the area under the curve (AUC) = 0.912 on the improved data set. In addition, it yields much better results on samples with high binding propensity than on randomly selected samples. This suggests that there are considerable false-positive PPI sites in the positive samples defined by the distance between residue atoms.

Assuntos

Redes Neurais de Computação , Mapeamento de Interação de Proteínas/métodos , Animais , Sítios de Ligação , Conjuntos de Dados como Assunto/normas , Humanos , Ligação Proteica , Mapeamento de Interação de Proteínas/normas , Reprodutibilidade dos Testes

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA