RESUMO
PURPOSE: Biomedical databases combining electronic medical records and phenotypic and genomic data constitute a powerful resource for the personalization of treatment. To leverage the wealth of information provided, algorithms are required that systematically translate the contained information into treatment recommendations based on existing genotype-phenotype associations. METHODS: We developed and tested algorithms for translation of preexisting genotype data of over 44,000 participants of the Estonian biobank into pharmacogenetic recommendations. We compared the results obtained by genome sequencing, exome sequencing, and genotyping using microarrays, and evaluated the impact of pharmacogenetic reporting based on drug prescription statistics in the Nordic countries and Estonia. RESULTS: Our most striking result was that the performance of genotyping arrays is similar to that of genome sequencing, whereas exome sequencing is not suitable for pharmacogenetic predictions. Interestingly, 99.8% of all assessed individuals had a genotype associated with increased risks to at least one medication, and thereby the implementation of pharmacogenetic recommendations based on genotyping affects at least 50 daily drug doses per 1000 inhabitants. CONCLUSION: We find that microarrays are a cost-effective solution for creating preemptive pharmacogenetic reports, and with slight modifications, existing databases can be applied for automated pharmacogenetic decision support for clinicians.
Assuntos
Farmacogenética/métodos , Variantes Farmacogenômicos/genética , Análise de Sequência de DNA/métodos , Algoritmos , Bancos de Espécimes Biológicos , Bases de Dados Factuais , Registros Eletrônicos de Saúde , Estônia , Testes Genéticos/normas , Genótipo , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Testes Farmacogenômicos/métodos , Fenótipo , Medicina de Precisão/métodosRESUMO
Allele-specific analyses to understand frequency differences across populations, particularly populations not well studied, are important to help identify variants that may have a functional effect on disease mechanisms and phenotypic predisposition, facilitating new Genome-Wide Association Studies (GWAS). We aimed to compare the allele frequency of 11 asthma-associated and 16 liver disease-associated single nucleotide polymorphisms (SNPs) between the Estonian, HapMap and 1000 genome project populations. When comparing EGCUT with HapMap populations, the largest difference in allele frequencies was observed with the Maasai population in Kinyawa, Kenya, with 12 SNP variants reporting statistical significance. Similarly, when comparing EGCUT with 1000 genomes project populations, the largest difference in allele frequencies was observed with pooled African populations with 22 SNP variants reporting statistical significance. For 11 asthma-associated and 16 liver disease-associated SNPs, Estonians are genetically similar to other European populations but significantly different from African populations. Understanding differences in genetic architecture between ethnic populations is important to facilitate new GWAS targeted at underserved ethnic groups to enable novel genetic findings to aid the development of new therapies to reduce morbidity and mortality.
Assuntos
Asma/genética , Frequência do Gene/genética , Genética Populacional , Genoma Humano , Projeto HapMap , Hepatopatias/genética , Polimorfismo de Nucleotídeo Único/genética , Estônia , HumanosRESUMO
Functional enrichment analysis is a key step in interpreting gene lists discovered in diverse high-throughput experiments. g:Profiler studies flat and ranked gene lists and finds statistically significant Gene Ontology terms, pathways and other gene function related terms. Translation of hundreds of gene identifiers is another core feature of g:Profiler. Since its first publication in 2007, our web server has become a popular tool of choice among basic and translational researchers. Timeliness is a major advantage of g:Profiler as genome and pathway information is synchronized with the Ensembl database in quarterly updates. g:Profiler supports 213 species including mammals and other vertebrates, plants, insects and fungi. The 2016 update of g:Profiler introduces several novel features. We have added further functional datasets to interpret gene lists, including transcription factor binding site predictions, Mendelian disease annotations, information about protein expression and complexes and gene mappings of human genetic polymorphisms. Besides the interactive web interface, g:Profiler can be accessed in computational pipelines using our R package, Python interface and BioJS component. g:Profiler is freely available at http://biit.cs.ut.ee/gprofiler/.
Assuntos
Regulação da Expressão Gênica , Ontologia Genética , Fatores de Transcrição/genética , Interface Usuário-Computador , Animais , Sítios de Ligação , Gráficos por Computador , Fungos/genética , Perfilação da Expressão Gênica , Humanos , Insetos/genética , Internet , Anotação de Sequência Molecular , Plantas/genética , Ligação Proteica , Fatores de Transcrição/metabolismo , Vertebrados/genéticaRESUMO
BACKGROUND: Modern activity trackers, including the Fitbit Zip, enable the measurement of both the step count as well as physical activity (PA) intensities. However, there is a need for field-based validation studies in a variety of populations before using trackers for research. Therefore, the purpose of the current study was to investigate the validity of Fitbit Zip step count, moderate to vigorous physical activity (MVPA) and sedentary minutes, in different school segments in 3rd grade students. METHODS: Third grade students (N = 147, aged 9-10 years) wore a Fitbit Zip and an ActiGraph GT3x-BT accelerometer simultaneously on a belt for five days during school hours. The number of steps, minutes of MVPA and sedentary time during class time, physical education lessons and recess were extracted from both devices using time filters, based on the information from school time tables obtained from class teachers. The validity of the Fitbit Zip in different school segments was assessed using Bland-Altman analysis and Spearman's correlation. RESULTS: There was a strong correlation in the number of steps in all in-school segments between the two devices (r = 0.85-0.96, P < 0.001). The Fitbit Zip overestimated the number of steps in all segments, with the greatest overestimation being present in physical education lessons (345 steps). As for PA intensities, the agreement between the two devices in physical education and recess was moderate for MVPA minutes (r = 0.56 and r = 0.72, P < 0.001, respectively) and strong for sedentary time (r = 0.85 and r = 0.87, P < 0.001, respectively). During class time, the correlation was weak for MVPA minutes (r = 0.24, P < 0.001) and moderate for sedentary time (r = 0.57, P < 0.001). For total in-school time, the correlation between the two devices was strong for steps (r = 0.98, P < 0.001), MVPA (r = 0.80, P < 0.001) and sedentary time (r = 0.94, P < 0.001). CONCLUSION: In general, the Fitbit Zip can be considered a relatively accurate device for measuring the number of steps, MVPA and sedentary time in students in a school-setting. However, in segments where sedentary time dominates (e.g. academic classes), a research-grade accelerometer should be preferred.
Assuntos
Actigrafia/instrumentação , Exercício Físico , Monitores de Aptidão Física/normas , Criança , Estudos Transversais , Feminino , Humanos , Masculino , Monitorização Ambulatorial/instrumentação , Educação Física e Treinamento , Reprodutibilidade dos Testes , EstudantesRESUMO
OBJECTIVE: To introduce 2 R-packages that facilitate conducting health economics research on OMOP-based data networks, aiming to standardize and improve the reproducibility, transparency, and transferability of health economic models. MATERIALS AND METHODS: We developed the software tools and demonstrated their utility by replicating a UK-based heart failure data analysis across 5 different international databases from Estonia, Spain, Serbia, and the United States. RESULTS: We examined treatment trajectories of 47 163 patients. The overall incremental cost-effectiveness ratio (ICER) for telemonitoring relative to standard of care was 57 472 /QALY. Country-specific ICERs were 60 312 /QALY in Estonia, 58 096 /QALY in Spain, 40 372 /QALY in Serbia, and 90 893 /QALY in the US, which surpassed the established willingness-to-pay thresholds. DISCUSSION: Currently, the cost-effectiveness analysis lacks standard tools, is performed in ad-hoc manner, and relies heavily on published information that might not be specific for local circumstances. Published results often exhibit a narrow focus, central to a single site, and provide only partial decision criteria, limiting their generalizability and comprehensive utility. CONCLUSION: We created 2 R-packages to pioneer cost-effectiveness analysis in OMOP CDM data networks. The first manages state definitions and database interaction, while the second focuses on Markov model learning and profile synthesis. We demonstrated their utility in a multisite heart failure study, comparing telemonitoring and standard care, finding telemonitoring not cost-effective.
Assuntos
Análise de Custo-Efetividade , Insuficiência Cardíaca , Humanos , Estados Unidos , Análise Custo-Benefício , Reprodutibilidade dos Testes , Modelos Econômicos , Insuficiência Cardíaca/terapia , Cadeias de MarkovRESUMO
Importance: Large-scale data on type-specific human papillomavirus (HPV) prevalence and disease burden worldwide are needed to guide cervical cancer prevention efforts. Promoting the research and application of health care big data has become a key factor in modern medical research. Objective: To examine the prevaccination prevalence of high-risk HPV (hrHPV) and type distribution by cervical cytology grade in Estonia. Design, Setting, and Participants: This cross-sectional study used text mining and the linking of data from electronic health records and health care claims to examine type-specific hrHPV positivity in Estonia from 2012 to 2019. Participants were women aged at least 18 years. Statistical analysis was performed from September 2021 to August 2022. Main Outcomes and Measures: Type-specific hrHPV positivity rate by cervical cytological grade. Results: A total of 11â¯017 cases of cervical cytology complemented with data on hrHPV testing results between 2012 and 2019 from 66â¯451 women aged at least 18 years (mean [SD] age, 48.1 [21.0] years) were included. The most common hrHPV types were HPV16, 18, 31, 33, 51 and 52, which accounted for 73.8% of all hrHPV types detected. There was a marked decline in the positivity rate of hrHPV infection with increasing age, but the proportion did not vary significantly based on HPV type. Implementation of nonavalent prophylactic vaccination was estimated to reduce the number of women with high-grade cytology by 50.5% (95% CI, 47.4%-53.6%) and the number with low-grade cytology by 27.8% (95% CI, 26.3%-29.3%), giving an overall estimated reduction of 33.1% (95% CI, 31.7%-34.5%) in the number of women with precancerous cervical cytology findings. Conclusions and Relevance: In this cross-sectional study, text mining and natural language processing techniques allowed the detection of precursors to cervical cancer based on data stored by the nationwide health system. These findings contribute to the literature on type-specific HPV distribution by cervical cytology grade and document that α-9 phylogenetic group HPV types 16, 31, 33, 52 and α-7 phylogenetic group HPV 18 are the most frequently detected in normal-to-high-grade precancerous lesions in Estonia.
Assuntos
Infecções por Papillomavirus , Neoplasias do Colo do Útero , Adulto , Feminino , Humanos , Pessoa de Meia-Idade , Estudos Transversais , Estônia/epidemiologia , Papillomavirus Humano 16 , Papillomavirus Humano , Infecções por Papillomavirus/diagnóstico , Infecções por Papillomavirus/epidemiologia , Infecções por Papillomavirus/prevenção & controle , Filogenia , Prevalência , Neoplasias do Colo do Útero/diagnóstico , Neoplasias do Colo do Útero/epidemiologia , Neoplasias do Colo do Útero/prevenção & controleRESUMO
Objective: To describe the reusable transformation process of electronic health records (EHR), claims, and prescriptions data into Observational Medical Outcome Partnership (OMOP) Common Data Model (CDM), together with challenges faced and solutions implemented. Materials and Methods: We used Estonian national health databases that store almost all residents' claims, prescriptions, and EHR records. To develop and demonstrate the transformation process of Estonian health data to OMOP CDM, we used a 10% random sample of the Estonian population (n = 150 824 patients) from 2012 to 2019 (MAITT dataset). For the sample, complete information from all 3 databases was converted to OMOP CDM version 5.3. The validation was performed using open-source tools. Results: In total, we transformed over 100 million entries to standard concepts using standard OMOP vocabularies with the average mapping rate 95%. For conditions, observations, drugs, and measurements, the mapping rate was over 90%. In most cases, SNOMED Clinical Terms were used as the target vocabulary. Discussion: During the transformation process, we encountered several challenges, which are described in detail with concrete examples and solutions. Conclusion: For a representative 10% random sample, we successfully transferred complete records from 3 national health databases to OMOP CDM and created a reusable transformation process. Our work helps future researchers to transform linked databases into OMOP CDM more efficiently, ultimately leading to better real-world evidence.
RESUMO
The return of individual genomic results (ROR) to research participants is still in its early phase, and insight on how individuals respond to ROR is scarce. Studies contributing to the evidence base for best practices are crucial before these can be established. Here, we describe a ROR procedure conducted at a population-based biobank, followed by surveying the responses of almost 3000 participants to a range of results, and discuss lessons learned from the process, with the aim of facilitating large-scale expansion. Overall, participants perceived the information that they received with counseling as valuable, even when the reporting of high risks initially caused worry. The face-to-face delivery of results limited the number of participants who received results. Although the participants highly valued this type of communication, additional means of communication need to be considered to improve the feasibility of large-scale ROR. The feedback collected sheds light on the value judgements of the participants and on potential responses to the receipt of genetic risk information. Biobanks in other countries are planning or conducting similar projects, and the sharing of lessons learned may provide valuable insight and aid such endeavors.
Assuntos
Bancos de Espécimes Biológicos , Genômica , Humanos , ComunicaçãoRESUMO
Objective: To develop a framework for identifying temporal clinical event trajectories from Observational Medical Outcomes Partnership-formatted observational healthcare data. Materials and Methods: A 4-step framework based on significant temporal event pair detection is described and implemented as an open-source R package. It is used on a population-based Estonian dataset to first replicate a large Danish population-based study and second, to conduct a disease trajectory detection study for type 2 diabetes patients in the Estonian and Dutch databases as an example. Results: As a proof of concept, we apply the methods in the Estonian database and provide a detailed breakdown of our findings. All Estonian population-based event pairs are shown. We compare the event pairs identified from Estonia to Danish and Dutch data and discuss the causes of the differences. The overlap in the results was only 2.4%, which highlights the need for running similar studies in different populations. Conclusions: For the first time, there is a complete software package for detecting disease trajectories in health data.
RESUMO
BACKGROUND: Identification of rheumatoid arthritis (RA) patients at high risk of adverse health outcomes remains a major challenge. We aimed to develop and validate prediction models for a variety of adverse health outcomes in RA patients initiating first-line methotrexate (MTX) monotherapy. METHODS: Data from 15 claims and electronic health record databases across 9 countries were used. Models were developed and internally validated on Optum® De-identified Clinformatics® Data Mart Database using L1-regularized logistic regression to estimate the risk of adverse health outcomes within 3 months (leukopenia, pancytopenia, infection), 2 years (myocardial infarction (MI) and stroke), and 5 years (cancers [colorectal, breast, uterine] after treatment initiation. Candidate predictors included demographic variables and past medical history. Models were externally validated on all other databases. Performance was assessed using the area under the receiver operator characteristic curve (AUC) and calibration plots. FINDINGS: Models were developed and internally validated on 21,547 RA patients and externally validated on 131,928 RA patients. Models for serious infection (AUC: internal 0.74, external ranging from 0.62 to 0.83), MI (AUC: internal 0.76, external ranging from 0.56 to 0.82), and stroke (AUC: internal 0.77, external ranging from 0.63 to 0.95), showed good discrimination and adequate calibration. Models for the other outcomes showed modest internal discrimination (AUC < 0.65) and were not externally validated. INTERPRETATION: We developed and validated prediction models for a variety of adverse health outcomes in RA patients initiating first-line MTX monotherapy. Final models for serious infection, MI, and stroke demonstrated good performance across multiple databases and can be studied for clinical use. FUNDING: This activity under the European Health Data & Evidence Network (EHDEN) has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 806968. This Joint Undertaking receives support from the European Union's Horizon 2020 research and innovation programme and EFPIA.
Assuntos
Antirreumáticos , Artrite Reumatoide , Acidente Vascular Cerebral , Antirreumáticos/uso terapêutico , Artrite Reumatoide/tratamento farmacológico , Estudos de Coortes , Humanos , Metotrexato/uso terapêutico , Avaliação de Resultados em Cuidados de Saúde , Acidente Vascular Cerebral/etiologiaRESUMO
The Estonian Biobank, governed by the Institute of Genomics at the University of Tartu (Biobank), has stored genetic material/DNA and continuously collected data since 2002 on a total of 52,274 individuals representing ~5% of the Estonian adult population and is increasing. To explore the utility of data available in the Biobank, we conducted a phenome-wide association study (PheWAS) in two areas of interest to healthcare researchers; asthma and liver disease. We used 11 asthma and 13 liver disease-associated single nucleotide polymorphisms (SNPs), identified from published genome-wide association studies, to test our ability to detect established associations. We confirmed 2 asthma and 5 liver disease associated variants at nominal significance and directionally consistent with published results. We found 2 associations that were opposite to what was published before (rs4374383:AA increases risk of NASH/NAFLD, rs11597086 increases ALT level). Three SNP-diagnosis pairs passed the phenome-wide significance threshold: rs9273349 and E06 (thyroiditis, p = 5.50x10-8); rs9273349 and E10 (type-1 diabetes, p = 2.60x10-7); and rs2281135 and K76 (non-alcoholic liver diseases, including NAFLD, p = 4.10x10-7). We have validated our approach and confirmed the quality of the data for these conditions. Importantly, we demonstrate that the extensive amount of genetic and medical information from the Estonian Biobank can be successfully utilized for scientific research.
Assuntos
Asma/genética , Bancos de Espécimes Biológicos/estatística & dados numéricos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Hepatopatias/genética , Fenômica/métodos , Polimorfismo de Nucleotídeo Único , Adulto , Asma/complicações , Asma/epidemiologia , Estônia/epidemiologia , Feminino , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Hepatopatias/complicações , Hepatopatias/epidemiologia , Masculino , FenótipoRESUMO
Polygenic risk scores are gaining more and more attention for estimating genetic risks for liabilities, especially for noncommunicable diseases. They are now calculated using thousands of DNA markers. In this paper, we compare the score distributions of two previously published very large risk score models within different populations. We show that the risk score model together with its risk stratification thresholds, built upon the data of one population, cannot be applied to another population without taking into account the target population's structure. We also show that if an individual is classified to the wrong population, his/her disease risk can be systematically incorrectly estimated.
Assuntos
Doença das Coronárias/genética , Diabetes Mellitus Tipo 2/genética , Genética Populacional , Herança Multifatorial/genética , África , América , Ásia , Estônia , Europa (Continente) , Ásia Oriental , Frequência do Gene , Predisposição Genética para Doença/genética , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Análise de Componente Principal , Medição de Risco/métodos , Medição de Risco/estatística & dados numéricos , Fatores de RiscoRESUMO
Due to the heterogeneity of existing European sources of observational healthcare data, data source-tailored choices are needed to execute multi-data source, multi-national epidemiological studies. This makes transparent documentation paramount. In this proof-of-concept study, a novel standard data derivation procedure was tested in a set of heterogeneous data sources. Identification of subjects with type 2 diabetes (T2DM) was the test case. We included three primary care data sources (PCDs), three record linkage of administrative and/or registry data sources (RLDs), one hospital and one biobank. Overall, data from 12 million subjects from six European countries were extracted. Based on a shared event definition, sixteeen standard algorithms (components) useful to identify T2DM cases were generated through a top-down/bottom-up iterative approach. Each component was based on one single data domain among diagnoses, drugs, diagnostic test utilization and laboratory results. Diagnoses-based components were subclassified considering the healthcare setting (primary, secondary, inpatient care). The Unified Medical Language System was used for semantic harmonization within data domains. Individual components were extracted and proportion of population identified was compared across data sources. Drug-based components performed similarly in RLDs and PCDs, unlike diagnoses-based components. Using components as building blocks, logical combinations with AND, OR, AND NOT were tested and local experts recommended their preferred data source-tailored combination. The population identified per data sources by resulting algorithms varied from 3.5% to 15.7%, however, age-specific results were fairly comparable. The impact of individual components was assessed: diagnoses-based components identified the majority of cases in PCDs (93-100%), while drug-based components were the main contributors in RLDs (81-100%). The proposed data derivation procedure allowed the generation of data source-tailored case-finding algorithms in a standardized fashion, facilitated transparent documentation of the process and benchmarking of data sources, and provided bases for interpretation of possible inter-data source inconsistency of findings in future studies.