Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
1.
PLoS One ; 18(12): e0292089, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38096249

RESUMO

Genome-scale data have revealed daily rhythms in various species and tissues. However, current methods to assess rhythmicity largely restrict their focus to quantifying statistical significance, which may not reflect biological relevance. To address this limitation, we developed a method called LimoRhyde2 (the successor to our method LimoRhyde), which focuses instead on rhythm-related effect sizes and their uncertainty. For each genomic feature, LimoRhyde2 fits a curve using a series of linear models based on periodic splines, moderates the fits using an Empirical Bayes approach called multivariate adaptive shrinkage (Mash), then uses the moderated fits to calculate rhythm statistics such as peak-to-trough amplitude. The periodic splines capture non-sinusoidal rhythmicity, while Mash uses patterns in the data to account for different fits having different levels of noise. To demonstrate LimoRhyde2's utility, we applied it to multiple circadian transcriptome datasets. Overall, LimoRhyde2 prioritized genes having high-amplitude rhythms in expression, whereas a prior method (BooteJTK) prioritized "statistically significant" genes whose amplitudes could be relatively small. Thus, quantifying effect sizes using approaches such as LimoRhyde2 has the potential to transform interpretation of genomic data related to biological rhythms.


Assuntos
Ritmo Circadiano , Genômica , Ritmo Circadiano/genética , Teorema de Bayes , Transcriptoma , Genoma
2.
Am J Hum Genet ; 110(9): 1522-1533, 2023 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-37607538

RESUMO

Population-scale biobanks linked to electronic health record data provide vast opportunities to extend our knowledge of human genetics and discover new phenotype-genotype associations. Given their dense phenotype data, biobanks can also facilitate replication studies on a phenome-wide scale. Here, we introduce the phenotype-genotype reference map (PGRM), a set of 5,879 genetic associations from 523 GWAS publications that can be used for high-throughput replication experiments. PGRM phenotypes are standardized as phecodes, ensuring interoperability between biobanks. We applied the PGRM to five ancestry-specific cohorts from four independent biobanks and found evidence of robust replications across a wide array of phenotypes. We show how the PGRM can be used to detect data corruption and to empirically assess parameters for phenome-wide studies. Finally, we use the PGRM to explore factors associated with replicability of GWAS results.


Assuntos
Bancos de Espécimes Biológicos , Ciência de Dados , Humanos , Fenômica , Fenótipo , Genótipo
3.
bioRxiv ; 2023 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-36778295

RESUMO

Genome-scale data have revealed daily rhythms in various species and tissues. However, current methods to assess rhythmicity largely restrict their focus to quantifying statistical significance, which may not reflect biological relevance. To address this limitation, we developed a method called LimoRhyde2 (the successor to our method LimoRhyde), which focuses instead on rhythm-related effect sizes and their uncertainty. For each genomic feature, LimoRhyde2 fits a curve using a series of linear models based on periodic splines, moderates the fits using an Empirical Bayes approach called multivariate adaptive shrinkage (Mash), then uses the moderated fits to calculate rhythm statistics such as peak-to-trough amplitude. The periodic splines capture non-sinusoidal rhythmicity, while Mash uses patterns in the data to account for different fits having different levels of noise. To demonstrate LimoRhyde2's utility, we applied it to multiple circadian transcriptome datasets. Overall, LimoRhyde2 prioritized genes having high-amplitude rhythms in expression, whereas a prior method (BooteJTK) prioritized "statistically significant" genes whose amplitudes could be relatively small. Thus, quantifying effect sizes using approaches such as LimoRhyde2 has the potential to transform interpretation of genomic data related to biological rhythms.

4.
J Biol Rhythms ; 38(1): 3-14, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36419398

RESUMO

Biomedical research on mammals has traditionally neglected females, raising the concern that some scientific findings may generalize poorly to half the population. Although this lack of sex inclusion has been broadly documented, its extent within circadian genomics remains undescribed. To address this gap, we examined sex inclusion practices in a comprehensive collection of publicly available transcriptome studies on daily rhythms. Among 148 studies having samples from mammals in vivo, we found strong underrepresentation of females across organisms and tissues. Overall, only 23 of 123 studies in mice, 0 of 10 studies in rats, and 9 of 15 studies in humans included samples from females. In addition, studies having samples from both sexes tended to have more samples from males than from females. These trends appear to have changed little over time, including since 2016, when the US National Institutes of Health began requiring investigators to consider sex as a biological variable. Our findings highlight an opportunity to dramatically improve representation of females in circadian research and to explore sex differences in daily rhythms at the genome level.


Assuntos
Pesquisa Biomédica , Ritmo Circadiano , Humanos , Ratos , Camundongos , Masculino , Feminino , Animais , Transcriptoma , Mamíferos , Genômica , Caracteres Sexuais
5.
PeerJ ; 10: e14372, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36389425

RESUMO

Transcriptome data have become invaluable for interrogating biological systems. Preparing a transcriptome dataset for analysis, particularly an RNA-seq dataset, entails multiple steps and software programs, each with its own command-line interface (CLI). Although these CLIs are powerful, they often require shell scripting for automation and parallelization, which can have a high learning curve, especially when the details of the CLIs vary from one tool to another. However, many individuals working with transcriptome data are already familiar with R due to the plethora and popularity of R-based tools for analyzing biological data. Thus, we developed an R package called seeker for simplified fetching and processing of RNA-seq and microarray data. Seeker is a wrapper around various existing tools, and provides a standard interface, simple parallelization, and detailed logging. Seeker's primary output-sample metadata and gene expression values based on Entrez or Ensembl Gene IDs-can be directly plugged into a differential expression analysis. To maximize reproducibility, seeker is available as a standalone R package and in a Docker image that includes all dependencies, both of which are accessible at https://seeker.hugheylab.org.


Assuntos
Software , Transcriptoma , Humanos , Transcriptoma/genética , Reprodutibilidade dos Testes , RNA-Seq
6.
Bioinformatics ; 38(21): 4972-4974, 2022 10 31.
Artigo em Inglês | MEDLINE | ID: mdl-36083022

RESUMO

SUMMARY: Electronic health record (EHR) data linked to DNA biobanks are a valuable resource for understanding the phenotypic effects of human genetic variation. We previously developed the phenotype risk score (PheRS) as an approach to quantify the extent to which a patient's clinical features resemble a given Mendelian disease. Using PheRS, we have uncovered novel associations between Mendelian disease-like phenotypes and rare genetic variants, and identified patients who may have undiagnosed Mendelian disease. Although the PheRS approach is conceptually simple, it involves multiple mapping steps and was previously only available as custom scripts, limiting the approach's usability. Thus, we developed the phers R package, a complete and user-friendly set of functions and maps for performing a PheRS-based analysis on linked clinical and genetic data. The package includes up-to-date maps between EHR-based phenotypes (i.e. ICD codes and phecodes), human phenotype ontology terms and Mendelian diseases. Starting with occurrences of ICD codes, the package enables the user to calculate PheRSs, validate the scores using case-control analyses, and perform genetic association analyses. By increasing PheRS's transparency and usability, the phers R package will help improve our understanding of the relationships between rare genetic variants and clinically meaningful human phenotypes. AVAILABILITY AND IMPLEMENTATION: The phers R package is free and open-source and available on CRAN and at https://phers.hugheylab.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Registros Eletrônicos de Saúde , Software , Humanos , Fenótipo , Fatores de Risco , Testes Genéticos
7.
Bioinformatics ; 38(8): 2297-2306, 2022 04 12.
Artigo em Inglês | MEDLINE | ID: mdl-35157022

RESUMO

MOTIVATION: Logistic regression models are used in genomic studies to analyze the genetic data linked to electronic health records (EHRs), and do not take full usage of the time-to-event information available in EHRs. Previous work has shown that Cox regression, which can account for left truncation and right censoring in EHRs, increased the power to detect genotype-phenotype associations compared to logistic regression. We extend this to evaluate the relative performance of Cox regression and various logistic regression models in the presence of positive errors in event time (delayed event time), relating to recorded event time accuracy. RESULTS: One Cox model and three logistic regression models were considered under different scenarios of delayed event time. Extensive simulations and a genomic study application were used to evaluate the impact of delayed event time. While logistic regression does not model the time-to-event directly, various logistic regression models used in the literature were more sensitive to delayed event time than Cox regression. Results highlighted the importance to identify and exclude the patients diagnosed before entry time. Cox regression had similar or modest improvement in statistical power over various logistic regression models at controlled type I error. This was supported by the empirical data, where the Cox models steadily had the highest sensitivity to detect known genotype-phenotype associations under all scenarios of delayed event time. AVAILABILITY AND IMPLEMENTATION: Access to individual-level EHR and genotype data is restricted by the IRB. Simulation code and R script for data process are at: https://github.com/QingxiaCindyChen/CoxRobustEHR.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla , Modelos de Riscos Proporcionais , Modelos Logísticos , Genótipo , Simulação por Computador
8.
J Anal Toxicol ; 46(1): 99-102, 2022 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-33216907

RESUMO

Point-of-care (POC) urine drug screening (UDS) assays provide immediate information for patient management. However, POC UDS assays can produce false-positive results, which may not be recognized until confirmatory testing is completed several days later. To minimize the potential for patient harm, it is critical to identify sources of interference. Here, we applied an approach based on statistical analysis of electronic health record (EHR) data to identify medications that may cause false positives on POC UDS assays. From our institution's EHR data, we extracted 120,670 POC UDS and confirmation results, covering 12 classes of target drugs, along with each individual's prior medication exposures. Our approach is based on the idea that exposure to an interfering medication will increase the odds of a false-positive UDS result. For a given assay-medication pair, we quantified the association between medication exposures and UDS results as an odds ratio from logistic regression. We evaluated interference experimentally by spiking compounds into drug-free urine and testing the spiked samples on the POC device. Our dataset included 446 false-positive UDS results (presumptive positive screen followed by negative confirmation). We quantified the odds ratio of false positives for 528 assay-medication pairs. Of the six assay-medication pairs we evaluated experimentally, two showed interference capable of producing a presumptive positive: labetalol on the 3,4-methylenedioxymethamphetamine (MDMA) assay (at 200 µg/mL) and ranitidine on the methamphetamine assay (at 50 µg/mL). Ranitidine also produced a presumptive positive for opiates at 1,600 µg/mL and for propoxyphene at 800 µg/mL. These findings highlight the generalizability and the limits of our approach to use EHR data to identify medications that interfere with clinical immunoassays.


Assuntos
Registros Eletrônicos de Saúde , Sistemas Automatizados de Assistência Junto ao Leito , Detecção do Abuso de Substâncias , Urinálise , Reações Falso-Positivas , Humanos
9.
Eur J Neurosci ; 54(9): 7063-7071, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34486778

RESUMO

Circadian clocks play key roles in how organisms respond to and even anticipate seasonal change in day length, or photoperiod. In mammals, photoperiod is encoded by the central circadian pacemaker in the brain, the suprachiasmatic nucleus (SCN). The subpopulation of SCN neurons that secrete the neuropeptide VIP mediates the transmission of light information within the SCN neural network, suggesting a role for these neurons in circadian plasticity in response to light information that has yet to be directly tested. Here, we used in vivo optogenetic stimulation of VIPergic SCN neurons followed by ex vivo PERIOD 2::LUCIFERASE (PER2::LUC) bioluminescent imaging to test whether activation of this SCN neuron subpopulation can induce SCN network changes that are hallmarks of photoperiodic encoding. We found that optogenetic stimulation designed to mimic a long photoperiod indeed altered subsequent SCN entrained phase, increased the phase dispersal of PER2 rhythms within the SCN network, and shortened SCN free-running period-similar to the effects of a true extension of photoperiod. Optogenetic stimulation also induced analogous changes on related aspects of locomotor behaviour in vivo. Thus, selective activation of VIPergic SCN neurons induces photoperiodic network plasticity in the SCN that underpins photoperiodic entrainment of behaviour.


Assuntos
Relógios Circadianos , Neurônios do Núcleo Supraquiasmático , Animais , Ritmo Circadiano , Mamíferos , Atividade Motora , Optogenética , Fotoperíodo , Núcleo Supraquiasmático
10.
Annu Rev Genomics Hum Genet ; 22: 219-238, 2021 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-34038146

RESUMO

Recent advances in genomic technology and widespread adoption of electronic health records (EHRs) have accelerated the development of genomic medicine, bringing promising research findings from genome science into clinical practice. Genomic and phenomic data, accrued across large populations through biobanks linked to EHRs, have enabled the study of genetic variation at a phenome-wide scale. Through new quantitative techniques, pleiotropy can be explored with phenome-wide association studies, the occurrence of common complex diseases can be predicted using the cumulative influence of many genetic variants (polygenic risk scores), and undiagnosed Mendelian syndromes can be identified using EHR-based phenotypic signatures (phenotype risk scores). In this review, we trace the role of EHRs from the development of genome-wide analytic techniques to translational efforts to test these new interventions to the clinic. Throughout, we describe the challenges that remain when combining EHRs with genetics to improve clinical care.


Assuntos
Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla , Genômica , Humanos , Fenótipo , Fatores de Risco
11.
Genome Res ; 31(10): 1742-1752, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-33837131

RESUMO

A major challenge for droplet-based single-cell sequencing technologies is distinguishing true cells from uninformative barcodes in data sets with disparate library sizes confounded by high technical noise (i.e., batch-specific ambient RNA). We present dropkick, a fully automated software tool for quality control and filtering of single-cell RNA sequencing (scRNA-seq) data with a focus on excluding ambient barcodes and recovering real cells bordering the quality threshold. By automatically determining data set-specific training labels based on predictive global heuristics, dropkick learns a gene-based representation of real cells and ambient noise, calculating a cell probability score for each barcode. Using simulated and real-world scRNA-seq data, we benchmarked dropkick against conventional thresholding approaches and EmptyDrops, a popular computational method, showing greater recovery of rare cell types and exclusion of empty droplets and noisy, uninformative barcodes. We show for both low- and high-background data sets that dropkick's weakly supervised model reliably learns which genes are enriched in ambient barcodes and draws a multidimensional boundary that is more robust to data set-specific variation than existing filtering approaches. dropkick provides a fast, automated tool for reproducible cell identification from scRNA-seq data that is critical to downstream analysis and compatible with popular single-cell Python packages.


Assuntos
Análise de Célula Única , Software , Perfilação da Expressão Gênica/métodos , Controle de Qualidade , RNA/genética , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
12.
PeerJ ; 9: e11071, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33763309

RESUMO

PubMed is an invaluable resource for the biomedical community. Although PubMed is freely available, the existing API is not designed for large-scale analyses and the XML structure of the underlying data is inconvenient for complex queries. We developed an R package called pmparser to convert the data in PubMed to a relational database. Our implementation of the database, called PMDB, currently contains data on over 31 million PubMed Identifiers (PMIDs) and is updated regularly. Together, pmparser and PMDB can enable large-scale, reproducible, and transparent analyses of the biomedical literature. pmparser is licensed under GPL-2 and available at https://pmparser.hugheylab.org. PMDB is available in both PostgreSQL (DOI 10.5281/zenodo.4008109) and Google BigQuery (https://console.cloud.google.com/bigquery?project=pmdb-bq&d=pmdb).

13.
PLoS Comput Biol ; 17(1): e1008567, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33406069

RESUMO

The chi-square periodogram (CSP), developed over 40 years ago, continues to be one of the most popular methods to estimate the period of circadian (circa 24-h) rhythms. Previous work has indicated the CSP is sometimes less accurate than other methods, but understanding of why and under what conditions remains incomplete. Using simulated rhythmic time-courses, we found that the CSP is prone to underestimating the period in a manner that depends on the true period and the length of the time-course. This underestimation bias is most severe in short time-courses (e.g., 3 days), but is also visible in longer simulated time-courses (e.g., 12 days) and in experimental time-courses of mouse wheel-running and ex vivo bioluminescence. We traced the source of the bias to discontinuities in the periodogram that are related to the number of time-points the CSP uses to calculate the observed variance for a given test period. By revising the calculation to avoid discontinuities, we developed a new version, the greedy CSP, that shows reduced bias and improved accuracy. Nonetheless, even the greedy CSP tended to be less accurate on our simulated time-courses than an alternative method, namely the Lomb-Scargle periodogram. Thus, although our study describes a major improvement to a classic method, it also suggests that users should generally avoid the CSP when estimating the period of biological rhythms.


Assuntos
Distribuição de Qui-Quadrado , Ritmo Circadiano/fisiologia , Biologia Computacional/normas , Animais , Viés , Interpretação Estatística de Dados , Camundongos , Modelos Biológicos , Projetos de Pesquisa/normas
14.
J Anal Toxicol ; 45(4): 325-330, 2021 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-32991692

RESUMO

Urine drug screening (UDS) assays can rapidly and sensitively detect drugs of abuse but can also produce spurious results due to interfering substances. We previously developed an approach to identify interfering medications using electronic health record (EHR) data, but the approach was limited to UDS assays for which presumptive positives were confirmed using more specific methods. Here we adapted the approach to search for medications that cause false positives on UDS assays lacking confirmation data. From our institution's EHR data, we used our previous dataset of 698,651 UDS and confirmation results. We also collected 211,108 UDS results for acetaminophen, ethanol and salicylates. Both datasets included individuals' prior medication exposures. We hypothesized that the odds of a presumptive positive would increase following exposure to an interfering medication independently of exposure to the assay's target drug(s). For a given assay-medication pair, we quantified potential interference as an odds ratio from logistic regression. We evaluated interference of selected compounds in spiking experiments. Compared to the approach requiring confirmation data, our adapted approach showed only modestly diminished ability to detect interfering medications. Applying our approach to the new data, we discovered and validated multiple compounds that can cause presumptive positives on the UDS assay for acetaminophen. Our approach can reveal interfering medications using EHR data from institutions at which UDS results are not routinely confirmed.


Assuntos
Detecção do Abuso de Substâncias , Avaliação Pré-Clínica de Medicamentos , Humanos
15.
J Biol Rhythms ; 35(4): 353-367, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32527181

RESUMO

Seasonal light cycles influence multiple physiological functions and are mediated through photoperiodic encoding by the circadian system. Despite our knowledge of the strong connection between seasonal light input and downstream circadian changes, less is known about the specific components of seasonal light cycles that are encoded and induce persistent changes in the circadian system. Using combinations of 3 T cycles (23, 24, 26 h) and 2 photoperiods per T cycle (long and short, with duty cycles scaled to each T cycle), we investigate the after-effects of entrainment to these 6 light cycles. We measure locomotor behavior duration (α), period (τ), and entrained phase angle (ψ) in vivo and SCN phase distribution (σφ), τ, and ψ ex vivo to refine our understanding of critical light components for influencing particular circadian properties. We find that both photoperiod and T-cycle length drive determination of in vivo ψ but differentially influence after-effects in α and τ, with photoperiod driving changes in α and photoperiod length and T-cycle length combining to influence τ. Using skeleton photoperiods, we demonstrate that in vivo ψ is determined by both parametric and nonparametric components, while changes in α are driven nonparametrically. Within the ex vivo SCN, we find that ψ and σφ of the PER2∷LUCIFERASE rhythm follow closely with their likely behavioral counterparts (ψ and α of the locomotor activity rhythm) while also confirming previous reports of τ after-effects of gene expression rhythms showing negative correlations with behavioral τ after-effects in response to T cycles. We demonstrate that within-SCN σφ changes, thought to underlie α changes in vivo, are induced primarily nonparametrically. Taken together, our results demonstrate that distinct components of seasonal light input differentially influence ψ, α, and τ and suggest the possibility of separate mechanisms driving the persistent changes in circadian behaviors mediated by seasonal light.


Assuntos
Relógios Circadianos/genética , Ritmo Circadiano/efeitos da radiação , Luz , Atividade Motora/efeitos da radiação , Fotoperíodo , Animais , Relógios Circadianos/efeitos da radiação , Mamíferos , Camundongos , Núcleo Supraquiasmático/fisiologia
16.
PLoS Biol ; 18(2): e3000622, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-32108181

RESUMO

Circadian (daily) regulation of metabolic pathways implies that food may be metabolized differentially over the daily cycle. To test that hypothesis, we monitored the metabolism of older subjects in a whole-room respiratory chamber over two separate 56-h sessions in a random crossover design. In one session, one of the 3 daily meals was presented as breakfast, whereas in the other session, a nutritionally equivalent meal was presented as a late-evening snack. The duration of the overnight fast was the same for both sessions. Whereas the two sessions did not differ in overall energy expenditure, the respiratory exchange ratio (RER) was different during sleep between the two sessions. Unexpectedly, this difference in RER due to daily meal timing was not due to daily differences in physical activity, sleep disruption, or core body temperature (CBT). Rather, we found that the daily timing of nutrient availability coupled with daily/circadian control of metabolism drives a switch in substrate preference such that the late-evening Snack Session resulted in significantly lower lipid oxidation (LO) compared to the Breakfast Session. Therefore, the timing of meals during the day/night cycle affects how ingested food is oxidized or stored in humans, with important implications for optimal eating habits.


Assuntos
Ritmo Circadiano/fisiologia , Metabolismo dos Lipídeos/fisiologia , Refeições/fisiologia , Índice de Massa Corporal , Desjejum , Metabolismo dos Carboidratos/fisiologia , Estudos Cross-Over , Comportamento Alimentar/fisiologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Oxirredução , Troca Gasosa Pulmonar/fisiologia , Sono/fisiologia , Lanches
17.
Elife ; 82019 12 06.
Artigo em Inglês | MEDLINE | ID: mdl-31808742

RESUMO

Preprints in biology are becoming more popular, but only a small fraction of the articles published in peer-reviewed journals have previously been released as preprints. To examine whether releasing a preprint on bioRxiv was associated with the attention and citations received by the corresponding peer-reviewed article, we assembled a dataset of 74,239 articles, 5,405 of which had a preprint, published in 39 journals. Using log-linear regression and random-effects meta-analysis, we found that articles with a preprint had, on average, a 49% higher Altmetric Attention Score and 36% more citations than articles without a preprint. These associations were independent of several other article- and author-level variables (such as scientific subfield and number of authors), and were unrelated to journal-level variables such as access model and Impact Factor. This observational study can help researchers and publishers make informed decisions about how to incorporate preprints into their work.


Assuntos
Revisão por Pares , Pré-Publicações como Assunto , Metanálise como Assunto , Publicações Periódicas como Assunto , Análise de Regressão
18.
BMC Genomics ; 20(1): 805, 2019 Nov 04.
Artigo em Inglês | MEDLINE | ID: mdl-31684865

RESUMO

BACKGROUND: The growth of DNA biobanks linked to data from electronic health records (EHRs) has enabled the discovery of numerous associations between genomic variants and clinical phenotypes. Nonetheless, although clinical data are generally longitudinal, standard approaches for detecting genotype-phenotype associations in such linked data, notably logistic regression, do not naturally account for variation in the period of follow-up or the time at which an event occurs. Here we explored the advantages of quantifying associations using Cox proportional hazards regression, which can account for the age at which a patient first visited the healthcare system (left truncation) and the age at which a patient either last visited the healthcare system or acquired a particular phenotype (right censoring). RESULTS: In comprehensive simulations, we found that, compared to logistic regression, Cox regression had greater power at equivalent Type I error. We then scanned for genotype-phenotype associations using logistic regression and Cox regression on 50 phenotypes derived from the EHRs of 49,792 genotyped individuals. Consistent with the findings from our simulations, Cox regression had approximately 10% greater relative sensitivity for detecting known associations from the NHGRI-EBI GWAS Catalog. In terms of effect sizes, the hazard ratios estimated by Cox regression were strongly correlated with the odds ratios estimated by logistic regression. CONCLUSIONS: As longitudinal health-related data continue to grow, Cox regression may improve our ability to identify the genetic basis for a wide range of human phenotypes.


Assuntos
Registros Eletrônicos de Saúde , Genômica , Genótipo , Fenótipo , Modelos de Riscos Proporcionais , Estudo de Associação Genômica Ampla , Humanos , Neoplasias/genética
19.
Clin Chem ; 65(12): 1522-1531, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31578215

RESUMO

BACKGROUND: Exposure to drugs of abuse is frequently assessed using urine drug screening (UDS) immunoassays. Although fast and relatively inexpensive, UDS assays often cross-react with unrelated compounds, which can lead to false-positive results and impair patient care. The current process of identifying cross-reactivity relies largely on case reports, making it sporadic and inefficient, and rendering knowledge of cross-reactivity incomplete. Here, we present a systematic approach to discover cross-reactive substances using data from electronic health records (EHRs). METHODS: Using our institution's EHR data, we assembled a data set of 698651 UDS results across 10 assays and linked each UDS result to the corresponding individual's previous medication exposures. We hypothesized that exposure to a cross-reactive ingredient would increase the odds of a false-positive screen. For 2201 assay-ingredient pairs, we quantified potential cross-reactivity as an odds ratio from logistic regression. We then evaluated cross-reactivity experimentally by spiking the ingredient or its metabolite into drug-free urine and testing the spiked samples on each assay. RESULTS: Our approach recovered multiple known cross-reactivities. After accounting for concurrent exposures to multiple ingredients, we selected 18 compounds (13 parent drugs and 5 metabolites) to evaluate experimentally. We validated 12 of 13 tested assay-ingredient pairs expected to show cross-reactivity by our analysis, discovering previously unknown cross-reactivities affecting assays for amphetamines, buprenorphine, cannabinoids, and methadone. CONCLUSIONS: Our findings can help laboratorians and providers interpret presumptive positive UDS results. Our data-driven approach can serve as a model for high-throughput discovery of substances that interfere with laboratory tests.


Assuntos
Reações Cruzadas/imunologia , Avaliação Pré-Clínica de Medicamentos/métodos , Detecção do Abuso de Substâncias/métodos , Urinálise/métodos , Registros Eletrônicos de Saúde , Reações Falso-Positivas , Humanos , Imunoensaio/métodos , Programas de Rastreamento/métodos
20.
J Am Med Inform Assoc ; 26(12): 1437-1447, 2019 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31609419

RESUMO

OBJECTIVE: The Phenotype Risk Score (PheRS) is a method to detect Mendelian disease patterns using phenotypes from the electronic health record (EHR). We compared the performance of different approaches mapping EHR phenotypes to Mendelian disease features. MATERIALS AND METHODS: PheRS utilizes Mendelian diseases descriptions annotated with Human Phenotype Ontology (HPO) terms. In previous work, we presented a map linking phecodes (based on International Classification of Diseases [ICD]-Ninth Revision) to HPO terms. For this study, we integrated ICD-Tenth Revision codes and lab data. We also created a new map between HPO terms using customized groupings of ICD codes. We compared the performance with cases and controls for 16 Mendelian diseases using 2.5 million de-identified medical records. RESULTS: PheRS effectively distinguished cases from controls for all 15 positive controls and all approaches tested (P < 4 × 1016). Adding lab data led to a statistically significant improvement for 4 of 14 diseases. The custom ICD groupings improved specificity, leading to an average 8% increase for precision at 100 (-2% to 22%). Eight of 10 adults with cystic fibrosis tested had PheRS in the 95th percentile prio to diagnosis. DISCUSSION: Both phecodes and custom ICD groupings were able to detect differences between affected cases and controls at the population level. The ICD map showed better precision for the highest scoring individuals. Adding lab data improved performance at detecting population-level differences. CONCLUSIONS: PheRS is a scalable method to study Mendelian disease at the population level using electronic health record data and can potentially be used to find patients with undiagnosed Mendelian disease.


Assuntos
Mineração de Dados/métodos , Registros Eletrônicos de Saúde , Doenças Genéticas Inatas/diagnóstico , Fenótipo , Adulto , Criança , Fibrose Cística , Doenças Genéticas Inatas/genética , Humanos , Classificação Internacional de Doenças , Fatores de Risco
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA