Pesquisa | Portal Regional da BVS

The validity of electronic health data for measuring smoking status: a systematic review and meta-analysis.

Haque, Md Ashiqul; Gedara, Muditha Lakmali Bodawatte; Nickel, Nathan; Turgeon, Maxime; Lix, Lisa M.

BMC Med Inform Decis Mak ; 24(1): 33, 2024 Feb 02.

Artigo em Inglês | MEDLINE | ID: mdl-38308231

RESUMO

BACKGROUND: Smoking is a risk factor for many chronic diseases. Multiple smoking status ascertainment algorithms have been developed for population-based electronic health databases such as administrative databases and electronic medical records (EMRs). Evidence syntheses of algorithm validation studies have often focused on chronic diseases rather than risk factors. We conducted a systematic review and meta-analysis of smoking status ascertainment algorithms to describe the characteristics and validity of these algorithms. METHODS: The Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines were followed. We searched articles published from 1990 to 2022 in EMBASE, MEDLINE, Scopus, and Web of Science with key terms such as validity, administrative data, electronic health records, smoking, and tobacco use. The extracted information, including article characteristics, algorithm characteristics, and validity measures, was descriptively analyzed. Sources of heterogeneity in validity measures were estimated using a meta-regression model. Risk of bias (ROB) in the reviewed articles was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. RESULTS: The initial search yielded 2086 articles; 57 were selected for review and 116 algorithms were identified. Almost three-quarters (71.6%) of algorithms were based on EMR data. The algorithms were primarily constructed using diagnosis codes for smoking-related conditions, although prescription medication codes for smoking treatments were also adopted. About half of the algorithms were developed using machine-learning models. The pooled estimates of positive predictive value, sensitivity, and specificity were 0.843, 0.672, and 0.918 respectively. Algorithm sensitivity and specificity were highly variable and ranged from 3 to 100% and 36 to 100%, respectively. Model-based algorithms had significantly greater sensitivity (p = 0.006) than rule-based algorithms. Algorithms for EMR data had higher sensitivity than algorithms for administrative data (p = 0.001). The ROB was low in most of the articles (76.3%) that underwent the assessment. CONCLUSIONS: Multiple algorithms using different data sources and methods have been proposed to ascertain smoking status in electronic health data. Many algorithms had low sensitivity and positive predictive value, but the data source influenced their validity. Algorithms based on machine-learning models for multiple linked data sources have improved validity.

Assuntos

Registros Eletrônicos de Saúde , Fumar , Humanos , Valor Preditivo dos Testes , Sensibilidade e Especificidade , Fumar/epidemiologia , Algoritmos , Doença Crônica

A Novel SARS-CoV-2 Viral Sequence Bioinformatic Pipeline Has Found Genetic Evidence That the Viral 3' Untranslated Region (UTR) Is Evolving and Generating Increased Viral Diversity.

Farkas, Carlos; Mella, Andy; Turgeon, Maxime; Haigh, Jody J.

Front Microbiol ; 12: 665041, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34234758

RESUMO

An unprecedented amount of SARS-CoV-2 sequencing has been performed, however, novel bioinformatic tools to cope with and process these large datasets is needed. Here, we have devised a bioinformatic pipeline that inputs SARS-CoV-2 genome sequencing in FASTA/FASTQ format and outputs a single Variant Calling Format file that can be processed to obtain variant annotations and perform downstream population genetic testing. As proof of concept, we have analyzed over 229,000 SARS-CoV-2 viral sequences up until November 30, 2020. We have identified over 39,000 variants worldwide with increased polymorphisms, spanning the ORF3a gene as well as the 3' untranslated (UTR) regions, specifically in the conserved stem loop region of SARS-CoV-2 which is accumulating greater observed viral diversity relative to chance variation. Our analysis pipeline has also discovered the existence of SARS-CoV-2 hypermutation with low frequency (less than in 2% of genomes) likely arising through host immune responses and not due to sequencing errors. Among annotated non-sense variants with a population frequency over 1%, recurrent inactivation of the ORF8 gene was found. This was found to be present in the newly identified B.1.1.7 SARS-CoV-2 lineage that originated in the United Kingdom. Almost all VOC-containing genomes possess one stop codon in ORF8 gene (Q27∗), however, 13% of these genomes also contains another stop codon (K68∗), suggesting that ORF8 loss does not interfere with SARS-CoV-2 spread and may play a role in its increased virulence. We have developed this computational pipeline to assist researchers in the rapid analysis and characterization of SARS-CoV-2 variation.

Principal component of explained variance: An efficient and optimal data dimension reduction framework for association studies.

Turgeon, Maxime; Oualkacha, Karim; Ciampi, Antonio; Miftah, Hanane; Dehghan, Golsa; Zanke, Brent W; Benedet, Andréa L; Rosa-Neto, Pedro; Greenwood, Celia Mt; Labbe, Aurélie.

Stat Methods Med Res ; 27(5): 1331-1350, 2018 05.

Artigo em Inglês | MEDLINE | ID: mdl-27460538

RESUMO

The genomics era has led to an increase in the dimensionality of data collected in the investigation of biological questions. In this context, dimension-reduction techniques can be used to summarise high-dimensional signals into low-dimensional ones, to further test for association with one or more covariates of interest. This paper revisits one such approach, previously known as principal component of heritability and renamed here as principal component of explained variance (PCEV). As its name suggests, the PCEV seeks a linear combination of outcomes in an optimal manner, by maximising the proportion of variance explained by one or several covariates of interest. By construction, this method optimises power; however, due to its computational complexity, it has unfortunately received little attention in the past. Here, we propose a general analytical PCEV framework that builds on the assets of the original method, i.e. conceptually simple and free of tuning parameters. Moreover, our framework extends the range of applications of the original procedure by providing a computationally simple strategy for high-dimensional outcomes, along with exact and asymptotic testing procedures that drastically reduce its computational cost. We investigate the merits of the PCEV using an extensive set of simulations. Furthermore, the use of the PCEV approach is illustrated using three examples taken from the fields of epigenetics and brain imaging.

Assuntos

Análise de Variância , Análise de Componente Principal/métodos , Simulação por Computador , Metilação de DNA , Interpretação Estatística de Dados , Genes/genética , Humanos , Modelos Estatísticos , Análise Multivariada , Neuroimagem/estatística & dados numéricos

A Mendelian randomization study of the effect of type-2 diabetes on coronary heart disease.

Ahmad, Omar S; Morris, John A; Mujammami, Muhammad; Forgetta, Vincenzo; Leong, Aaron; Li, Rui; Turgeon, Maxime; Greenwood, Celia M T; Thanassoulis, George; Meigs, James B; Sladek, Robert; Richards, J Brent.

Nat Commun ; 6: 7060, 2015 May 28.

Artigo em Inglês | MEDLINE | ID: mdl-26017687

RESUMO

In observational studies, type-2 diabetes (T2D) is associated with an increased risk of coronary heart disease (CHD), yet interventional trials have shown no clear effect of glucose-lowering on CHD. Confounding may have therefore influenced these observational estimates. Here we use Mendelian randomization to obtain unconfounded estimates of the influence of T2D and fasting glucose (FG) on CHD risk. Using multiple genetic variants associated with T2D and FG, we find that risk of T2D increases CHD risk (odds ratio (OR)=1.11 (1.05-1.17), per unit increase in odds of T2D, P=8.8 × 10(-5); using data from 34,840/114,981 T2D cases/controls and 63,746/130,681 CHD cases/controls). FG in non-diabetic individuals tends to increase CHD risk (OR=1.15 (1.00-1.32), per mmol·per l, P=0.05; 133,010 non-diabetic individuals and 63,746/130,681 CHD cases/controls). These findings provide evidence supporting a causal relationship between T2D and CHD and suggest that long-term trials may be required to discern the effects of T2D therapies on CHD risk.

Assuntos

Glicemia/genética , Doença das Coronárias/genética , Diabetes Mellitus Tipo 2/genética , Adulto , Idoso , Glicemia/metabolismo , Estudos de Casos e Controles , Causalidade , Doença das Coronárias/epidemiologia , Diabetes Mellitus Tipo 2/epidemiologia , Diabetes Mellitus Tipo 2/metabolismo , Hemoglobinas Glicadas/genética , Hemoglobinas Glicadas/metabolismo , Humanos , Análise da Randomização Mendeliana , Pessoa de Meia-Idade , Razão de Chances , Polimorfismo de Nucleotídeo Único , Fatores de Risco

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA