Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 44
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Bioinformatics ; 40(1)2024 01 02.
Article in English | MEDLINE | ID: mdl-38175789

ABSTRACT

SUMMARY: Knowledge graphs are being increasingly used in biomedical research to link large amounts of heterogenous data and facilitate reasoning across diverse knowledge sources. Wider adoption and exploration of knowledge graphs in the biomedical research community is limited by requirements to understand the underlying graph structure in terms of entity types and relationships, represented as nodes and edges, respectively, and learn specialized query languages for graph mining and exploration. We have developed a user-friendly interface dubbed ExEmPLAR (Extracting, Exploring, and Embedding Pathways Leading to Actionable Research) to aid reasoning over biomedical knowledge graphs and assist with data-driven research and hypothesis generation. We explain the key functionalities of ExEmPLAR and demonstrate its use with a case study considering the relationship of Trypanosoma cruzi, the etiological agent of Chagas disease, to frequently associated cardiovascular conditions. AVAILABILITY AND IMPLEMENTATION: ExEmPLAR is freely accessible at https://www.exemplar.mml.unc.edu/. For code and instructions for the using the application, see: https://github.com/beasleyjonm/AOP-COP-Path-Extractor.


Subject(s)
Biomedical Research , Pattern Recognition, Automated
2.
Bioinformatics ; 38(12): 3252-3258, 2022 06 13.
Article in English | MEDLINE | ID: mdl-35441678

ABSTRACT

MOTIVATION: As the number of public data resources continues to proliferate, identifying relevant datasets across heterogenous repositories is becoming critical to answering scientific questions. To help researchers navigate this data landscape, we developed Dug: a semantic search tool for biomedical datasets utilizing evidence-based relationships from curated knowledge graphs to find relevant datasets and explain why those results are returned. RESULTS: Developed through the National Heart, Lung and Blood Institute's (NHLBI) BioData Catalyst ecosystem, Dug has indexed more than 15 911 study variables from public datasets. On a manually curated search dataset, Dug's total recall (total relevant results/total results) of 0.79 outperformed default Elasticsearch's total recall of 0.76. When using synonyms or related concepts as search queries, Dug (0.36) far outperformed Elasticsearch (0.14) in terms of total recall with no significant loss in the precision of its top results. AVAILABILITY AND IMPLEMENTATION: Dug is freely available at https://github.com/helxplatform/dug. An example Dug deployment is also available for use at https://search.biodatacatalyst.renci.org/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Search Engine , Semantics , Ecosystem , Abstracting and Indexing
3.
Bioinformatics ; 37(4): 586-587, 2021 05 01.
Article in English | MEDLINE | ID: mdl-33175089

ABSTRACT

SUMMARY: In response to the COVID-19 pandemic, we established COVID-KOP, a new knowledgebase integrating the existing Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways (ROBOKOP) biomedical knowledge graph with information from recent biomedical literature on COVID-19 annotated in the CORD-19 collection. COVID-KOP can be used effectively to generate new hypotheses concerning repurposing of known drugs and clinical drug candidates against COVID-19 by establishing respective confirmatory pathways of drug action. AVAILABILITY AND IMPLEMENTATION: COVID-KOP is freely accessible at https://covidkop.renci.org/. For code and instructions for the original ROBOKOP, see: https://github.com/NCATS-Gamma/robokop.


Subject(s)
COVID-19 , Databases, Factual , Humans , Knowledge Bases , Pandemics , SARS-CoV-2
4.
BMC Bioinformatics ; 22(1): 374, 2021 Jul 20.
Article in English | MEDLINE | ID: mdl-34284719

ABSTRACT

BACKGROUND: As exome sequencing (ES) integrates into clinical practice, we should make every effort to utilize all information generated. Copy-number variation can lead to Mendelian disorders, but small copy-number variants (CNVs) often get overlooked or obscured by under-powered data collection. Many groups have developed methodology for detecting CNVs from ES, but existing methods often perform poorly for small CNVs and rely on large numbers of samples not always available to clinical laboratories. Furthermore, methods often rely on Bayesian approaches requiring user-defined priors in the setting of insufficient prior knowledge. This report first demonstrates the benefit of multiplexed exome capture (pooling samples prior to capture), then presents a novel detection algorithm, mcCNV ("multiplexed capture CNV"), built around multiplexed capture. RESULTS: We demonstrate: (1) multiplexed capture reduces inter-sample variance; (2) our mcCNV method, a novel depth-based algorithm for detecting CNVs from multiplexed capture ES data, improves the detection of small CNVs. We contrast our novel approach, agnostic to prior information, with the the commonly-used ExomeDepth. In a simulation study mcCNV demonstrated a favorable false discovery rate (FDR). When compared to calls made from matched genome sequencing, we find the mcCNV algorithm performs comparably to ExomeDepth. CONCLUSION: Implementing multiplexed capture increases power to detect single-exon CNVs. The novel mcCNV algorithm may provide a more favorable FDR than ExomeDepth. The greatest benefits of our approach derive from (1) not requiring a database of reference samples and (2) not requiring prior information about the prevalance or size of variants.


Subject(s)
DNA Copy Number Variations , Exome , Algorithms , Bayes Theorem , Exome/genetics , High-Throughput Nucleotide Sequencing , Exome Sequencing
5.
J Chem Inf Model ; 61(12): 5734-5741, 2021 12 27.
Article in English | MEDLINE | ID: mdl-34783553

ABSTRACT

The COVID-19 pandemic has catalyzed a widespread effort to identify drug candidates and biological targets of relevance to SARS-COV-2 infection, which resulted in large numbers of publications on this subject. We have built the COVID-19 Knowledge Extractor (COKE), a web application to extract, curate, and annotate essential drug-target relationships from the research literature on COVID-19. SciBiteAI ontological tagging of the COVID Open Research Data set (CORD-19), a repository of COVID-19 scientific publications, was employed to identify drug-target relationships. Entity identifiers were resolved through lookup routines using UniProt and DrugBank. A custom algorithm was used to identify co-occurrences of the target protein and drug terms, and confidence scores were calculated for each entity pair. COKE processing of the current CORD-19 database identified about 3000 drug-protein pairs, including 29 unique proteins and 500 investigational, experimental, and approved drugs. Some of these drugs are presently undergoing clinical trials for COVID-19. The COKE repository and web application can serve as a useful resource for drug repurposing against SARS-CoV-2. COKE is freely available at https://coke.mml.unc.edu/, and the code is available at https://github.com/DnlRKorn/CoKE.


Subject(s)
COVID-19 , Pharmaceutical Preparations , Antiviral Agents , Drug Repositioning , Humans , Pandemics , SARS-CoV-2
6.
Bioinformatics ; 35(24): 5382-5384, 2019 12 15.
Article in English | MEDLINE | ID: mdl-31410449

ABSTRACT

SUMMARY: Knowledge graphs (KGs) are quickly becoming a common-place tool for storing relationships between entities from which higher-level reasoning can be conducted. KGs are typically stored in a graph-database format, and graph-database queries can be used to answer questions of interest that have been posed by users such as biomedical researchers. For simple queries, the inclusion of direct connections in the KG and the storage and analysis of query results are straightforward; however, for complex queries, these capabilities become exponentially more challenging with each increase in complexity of the query. For instance, one relatively complex query can yield a KG with hundreds of thousands of query results. Thus, the ability to efficiently query, store, rank and explore sub-graphs of a complex KG represents a major challenge to any effort designed to exploit the use of KGs for applications in biomedical research and other domains. We present Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways as an abstraction layer and user interface to more easily query KGs and store, rank and explore query results. AVAILABILITY AND IMPLEMENTATION: An instance of the ROBOKOP UI for exploration of the ROBOKOP Knowledge Graph can be found at http://robokop.renci.org. The ROBOKOP Knowledge Graph can be accessed at http://robokopkg.renci.org. Code and instructions for building and deploying ROBOKOP are available under the MIT open software license from https://github.com/NCATS-Gamma/robokop. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Pattern Recognition, Automated , Software , Databases, Factual
7.
J Chem Inf Model ; 59(12): 4968-4973, 2019 12 23.
Article in English | MEDLINE | ID: mdl-31769676

ABSTRACT

A proliferation of data sources has led to the notional existence of an implicit Knowledge Graph (KG) that contains vast amounts of biological knowledge contributed by distributed Application Programming Interfaces (APIs). However, challenges arise when integrating data across multiple APIs due to incompatible semantic types, identifier schemes, and data formats. We present ROBOKOP KG ( http://robokopkg.renci.org ), which is a KG that was initially built to support the open biomedical question-answering application, ROBOKOP (Reasoning Over Biomedical Objects linked in Knowledge-Oriented Pathways) ( http://robokop.renci.org ). Additionally, we present the ROBOKOP Knowledge Graph Builder (KGB), which constructs the KG and provides an extensible framework to handle graph query over and integration of federated data sources.


Subject(s)
Computer Graphics , Data Mining/methods , Knowledge Bases , Databases, Factual , User-Computer Interface
8.
Hum Mutat ; 39(11): 1690-1701, 2018 11.
Article in English | MEDLINE | ID: mdl-30311374

ABSTRACT

Effective exchange of information about genetic variants is currently hampered by the lack of readily available globally unique variant identifiers that would enable aggregation of information from different sources. The ClinGen Allele Registry addresses this problem by providing (1) globally unique "canonical" variant identifiers (CAids) on demand, either individually or in large batches; (2) access to variant-identifying information in a searchable Registry; (3) links to allele-related records in many commonly used databases; and (4) services for adding links to information about registered variants in external sources. A core element of the Registry is a canonicalization service, implemented using in-memory sequence alignment-based index, which groups variant identifiers denoting the same nucleotide variant and assigns unique and dereferenceable CAids. More than 650 million distinct variants are currently registered, including those from gnomAD, ExAC, dbSNP, and ClinVar, including a small number of variants registered by Registry users. The Registry is accessible both via a web interface and programmatically via well-documented Hypertext Transfer Protocol (HTTP) Representational State Transfer Application Programming Interface (REST-APIs). For programmatic interoperability, the Registry content is accessible in the JavaScript Object Notation for Linked Data (JSON-LD) format. We present several use cases and demonstrate how the linked information may provide raw material for reasoning about variant's pathogenicity.


Subject(s)
Databases, Genetic , Genetic Variation/genetics , Alleles , Humans , Registries , Software
9.
Hum Mutat ; 39(11): 1686-1689, 2018 11.
Article in English | MEDLINE | ID: mdl-30311379

ABSTRACT

The Clinical Genome Resource (ClinGen)'s work to develop a knowledge base to support the understanding of genes and variants for use in precision medicine and research depends on robust, broadly applicable, and adaptable technical standards for sharing data and information. To forward this goal, ClinGen has joined with the Global Alliance for Genomics and Health (GA4GH) to support the development of open, freely-available technical standards and regulatory frameworks for secure and responsible sharing of genomic and health-related data. In its capacity as one of the 15 inaugural GA4GH "Driver Projects," ClinGen is providing input on the key standards needs of the global genomics community, and has committed to participate on GA4GH Work Streams to support the development of: (1) a standard model for computer-readable variant representation; (2) a data model for linking variant data to annotations; (3) a specification to enable sharing of genomic variant knowledge and associated clinical interpretations; and (4) a set of best practices for use of phenotype and disease ontologies. ClinGen's participation as a GA4GH Driver Project will provide a robust environment to test drive emerging genomic knowledge sharing standards and prove their utility among the community, while accelerating the construction of the ClinGen evidence base.


Subject(s)
Genome, Human/genetics , Information Dissemination/methods , Computational Biology , Databases, Genetic , Genetic Variation , Genomics , Humans , Precision Medicine
10.
Addict Biol ; 23(1): 461-473, 2018 01.
Article in English | MEDLINE | ID: mdl-28111843

ABSTRACT

Recent advances in genome wide sequencing techniques and analytical methods allow for more comprehensive examinations of the genome than microarray-based genome-wide association studies (GWAS). The present report provides the first application of whole genome sequencing (WGS) to identify low frequency variants involved in cannabis dependence across two independent cohorts. The present study used low-coverage whole genome sequence data to conduct set-based association and enrichment analyses of low frequency variation in protein-coding regions as well as regulatory regions in relation to cannabis dependence. Two cohorts were studied: a population-based Native American tribal community consisting of 697 participants nested within large multi-generational pedigrees and a family-based sample of 1832 predominantly European ancestry participants largely nested within nuclear families. Participants in both samples were assessed for Diagnostic and Statistical Manual of Mental Disorders-IV (DSM-IV) lifetime cannabis dependence, with 168 and 241 participants receiving a positive diagnosis in each sample, respectively. Sequence kernel association tests identified one protein-coding region, C1orf110 and one regulatory region in the MEF2B gene that achieved significance in a meta-analysis of both samples. A regulatory region within the PCCB gene, a gene previously associated with schizophrenia, exhibited a suggestive association. Finally, a significant enrichment of regions within or near genes with multiple splice variants or involved in cell adhesion or potassium channel activity were associated with cannabis dependence. This initial study demonstrates the potential utility of low pass whole genome sequencing for identifying genetic variants involved in the etiology of cannabis use disorders.


Subject(s)
Indians, North American/genetics , Marijuana Abuse/genetics , White People/genetics , Adult , Cohort Studies , Female , Genome-Wide Association Study , Genotype , Humans , MEF2 Transcription Factors/genetics , Male , Methylmalonyl-CoA Decarboxylase/genetics , Middle Aged , Polymorphism, Single Nucleotide , Potassium Channels/genetics , Whole Genome Sequencing
11.
Am J Hum Genet ; 94(2): 233-45, 2014 Feb 06.
Article in English | MEDLINE | ID: mdl-24507775

ABSTRACT

Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments.


Subject(s)
Cholesterol, LDL/genetics , Exome , Gene Frequency , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Adult , Aged , Apolipoproteins E/blood , Apolipoproteins E/genetics , Cohort Studies , Dyslipidemias/blood , Dyslipidemias/genetics , Female , Follow-Up Studies , Genetic Code , Genotype , Humans , Lipase/genetics , Male , Middle Aged , Phenotype , Proprotein Convertase 9 , Proprotein Convertases/genetics , Receptors, LDL/genetics , Sequence Analysis, DNA , Serine Endopeptidases/genetics
12.
Genet Med ; 19(11): 1207-1216, 2017 11.
Article in English | MEDLINE | ID: mdl-28518170

ABSTRACT

PurposeWe investigated the diagnostic and clinical performance of exome sequencing in fetuses with sonographic abnormalities with normal karyotype and microarray and, in some cases, normal gene-specific sequencing.MethodsExome sequencing was performed on DNA from 15 anomalous fetuses and from the peripheral blood of their parents. Parents provided consent to be informed of diagnostic results in the fetus, medically actionable findings in the parents, and their identification as carrier couples for significant autosomal recessive conditions. We assessed the perceptions and understanding of exome sequencing using mixed methods in 15 mother-father dyads.ResultsIn seven (47%) of 15 fetuses, exome sequencing provided a diagnosis or possible diagnosis with identification of variants in the following genes: COL1A1, MUSK, KCTD1, RTTN, TMEM67, PIEZO1 and DYNC2H1. One additional case revealed a de novo nonsense mutation in a novel candidate gene (MAP4K4). The perceived likelihood that exome sequencing would explain the results (5.2 on a 10-point scale) was higher than the approximately 30% diagnostic yield discussed in pretest counseling.ConclusionExome sequencing had diagnostic utility in a highly select population of fetuses where a genetic diagnosis was highly suspected. Challenges related to genetics literacy and variant interpretation must be addressed by highly tailored pre- and posttest genetic counseling.


Subject(s)
Exome , Fetal Diseases/diagnosis , Fetal Diseases/genetics , Prenatal Diagnosis/methods , Sequence Analysis, DNA , Adult , Fathers , Female , Fetal Development/genetics , Fetal Diseases/diagnostic imaging , Fetus , Humans , Karyotype , Male , Mothers , Pregnancy , Pregnancy Complications , Prospective Studies , Protein Array Analysis , Retrospective Studies , Socioeconomic Factors , Ultrasonography, Prenatal
13.
Am J Med Genet B Neuropsychiatr Genet ; 174(5): 557-567, 2017 Jul.
Article in English | MEDLINE | ID: mdl-28440896

ABSTRACT

Nicotine dependence (ND) has a reported heritability of 40-70%. Low-coverage whole-genome sequencing was conducted in 1,889 samples from the UCSF Family study. Linear mixed models were used to conduct genome-wide association (GWA) tests of ND in this and five cohorts obtained from the database of Genotypes and Phenotypes. Fixed-effect meta-analysis was carried out separately for European (n = 14,713) and African (n = 3,369) participants, and then in a combined analysis of both ancestral groups. The meta-analysis of African participants identified a significant and novel susceptibility signal (rs56247223; p = 4.11 × 10-8 ). Data from the Genotype-Tissue Expression (GTEx) study suggested the protective allele is associated with reduced mRNA expression of CACNA2D3 in three human brain tissues (p < 4.94 × 10-2 ). Sequence data from the UCSF Family study suggested that a rare nonsynonymous variant in this gene conferred increased risk for ND (p = 0.01) providing further support for CACNA2D3 involvement in ND. Suggestive associations were observed in six additional regions in both European and merged populations (p < 5.00 × 10-6 ). The top variants were found to regulate mRNA expression levels of genes in human brains using GTEx data (p < 0.05): HAX1 and CHRNB2 (rs1760803), ADAMTSL1 (rs17198023), PEX2 (rs12680810), GLIS3 (rs12348139), non-coding RNA for LINC00476 (rs10759883), and GABBR1 (rs56020557 and rs62392942). A gene-based association test further supported the relation between GABBR1 and ND (p = 6.36 × 10-7 ). These findings will inform the biological mechanisms and development of therapeutic targets for ND.

14.
N Engl J Med ; 366(2): 141-9, 2012 Jan 12.
Article in English | MEDLINE | ID: mdl-22236224

ABSTRACT

BACKGROUND: Family history is a significant risk factor for prostate cancer, although the molecular basis for this association is poorly understood. Linkage studies have implicated chromosome 17q21-22 as a possible location of a prostate-cancer susceptibility gene. METHODS: We screened more than 200 genes in the 17q21-22 region by sequencing germline DNA from 94 unrelated patients with prostate cancer from families selected for linkage to the candidate region. We tested family members, additional case subjects, and control subjects to characterize the frequency of the identified mutations. RESULTS: Probands from four families were discovered to have a rare but recurrent mutation (G84E) in HOXB13 (rs138213197), a homeobox transcription factor gene that is important in prostate development. All 18 men with prostate cancer and available DNA in these four families carried the mutation. The carrier rate of the G84E mutation was increased by a factor of approximately 20 in 5083 unrelated subjects of European descent who had prostate cancer, with the mutation found in 72 subjects (1.4%), as compared with 1 in 1401 control subjects (0.1%) (P=8.5x10(-7)). The mutation was significantly more common in men with early-onset, familial prostate cancer (3.1%) than in those with late-onset, nonfamilial prostate cancer (0.6%) (P=2.0x10(-6)). CONCLUSIONS: The novel HOXB13 G84E variant is associated with a significantly increased risk of hereditary prostate cancer. Although the variant accounts for a small fraction of all prostate cancers, this finding has implications for prostate-cancer risk assessment and may provide new mechanistic insights into this common cancer. (Funded by the National Institutes of Health and others.).


Subject(s)
Germ-Line Mutation , Homeodomain Proteins/genetics , Prostatic Neoplasms/genetics , Chromosomes, Human, Pair 17 , Genetic Linkage , High-Throughput Nucleotide Sequencing , Humans , Male , Middle Aged , Pedigree , Prostate/pathology , Prostatic Neoplasms/pathology , Sequence Analysis, DNA
15.
BMC Genomics ; 15: 85, 2014 Jan 30.
Article in English | MEDLINE | ID: mdl-24479562

ABSTRACT

BACKGROUND: The reduction in the cost of sequencing a human genome has led to the use of genotype sampling strategies in order to impute and infer the presence of sequence variants that can then be tested for associations with traits of interest. Low-coverage Whole Genome Sequencing (WGS) is a sampling strategy that overcomes some of the deficiencies seen in fixed content SNP array studies. Linkage-disequilibrium (LD) aware variant callers, such as the program Thunder, may provide a calling rate and accuracy that makes a low-coverage sequencing strategy viable. RESULTS: We examined the performance of an LD-aware variant calling strategy in a population of 708 low-coverage whole genome sequences from a community sample of Native Americans. We assessed variant calling through a comparison of the sequencing results to genotypes measured in 641 of the same subjects using a fixed content first generation exome array. The comparison was made using the variant calling routines GATK Unified Genotyper program and the LD-aware variant caller Thunder. Thunder was found to improve concordance in a coverage dependent fashion, while correctly calling nearly all of the common variants as well as a high percentage of the rare variants present in the sample. CONCLUSIONS: Low-coverage WGS is a strategy that appears to collect genetic information intermediate in scope between fixed content genotyping arrays and deep-coverage WGS. Our data suggests that low-coverage WGS is a viable strategy with a greater chance of discovering novel variants and associations than fixed content arrays for large sample association analyses.


Subject(s)
Genome, Human , Indians, North American/genetics , Adolescent , Adult , Aged , Aged, 80 and over , Cohort Studies , Exome , Gene Frequency , Genetic Variation , Genotype , High-Throughput Nucleotide Sequencing , Humans , Linkage Disequilibrium , Middle Aged , Oligonucleotide Array Sequence Analysis , Polymorphism, Single Nucleotide , Software , Young Adult
16.
Bioinformatics ; 29(21): 2744-9, 2013 Nov 01.
Article in English | MEDLINE | ID: mdl-23956302

ABSTRACT

SUMMARY: Although the 1000 Genomes haplotypes are the most commonly used reference panel for imputation, medical sequencing projects are generating large alternate sets of sequenced samples. Imputation in African Americans using 3384 haplotypes from the Exome Sequencing Project, compared with 2184 haplotypes from 1000 Genomes Project, increased effective sample size by 8.3-11.4% for coding variants with minor allele frequency <1%. No loss of imputation quality was observed using a panel built from phenotypic extremes. We recommend using haplotypes from Exome Sequencing Project alone or concatenation of the two panels over quality score-based post-imputation selection or IMPUTE2's two-panel combination. CONTACT: yunli@med.unc.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Black or African American/genetics , Exome , Genetic Variation , Sequence Analysis, DNA/methods , Gene Frequency , Genome, Human , Genome-Wide Association Study , Haplotypes , Humans , Phenotype , Polymorphism, Single Nucleotide
17.
Am J Med Genet B Neuropsychiatr Genet ; 165B(8): 673-83, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25270064

ABSTRACT

Higher rates of alcohol use and other drug-dependence have been observed in some Native American (NA) populations relative to other ethnic groups in the US. Previous studies have shown that alcohol dehydrogenase (ADH) genes and aldehyde dehydrogenase (ALDH) genes may affect the risk of development of alcohol dependence, and that polymorphisms within these genes may differentially affect risk for the disorder depending on the ethnic group evaluated. We evaluated variations in the ADH and ALDH genes in a large study investigating risk factors for substance use in a NA population. We assessed ancestry admixture and tested for associations between alcohol-related phenotypes in the genomic regions around the ADH1-7 and ALDH2 and ALDH1A1 genes. Seventy-two ADH variants showed significant evidence of association with a severity level of alcohol drinking-related dependence symptoms phenotype. These significant variants spanned across the entire 7 ADH gene cluster regions. Two significant associations, one in ADH and one in ALDH2, were observed with alcohol dependence diagnosis. Seventeen variants showed significant association with the largest number of alcohol drinks ingested during any 24-hour period. Variants in or near ADH7 were significantly negatively associated with alcohol-related phenotypes, suggesting a potential protective effect of this gene. In addition, our results suggested that a higher degree of NA ancestry is associated with higher frequencies of potential risk variants and lower frequencies of potential protective variants for alcohol dependence phenotypes.


Subject(s)
Alcohol Dehydrogenase/genetics , Alcoholism/genetics , Aldehyde Dehydrogenase/genetics , Genetic Variation/genetics , Indians, North American/genetics , Polymorphism, Genetic/genetics , Adolescent , Adult , Aged , Aged, 80 and over , Female , Humans , Male , Middle Aged , Phenotype , Sequence Analysis, DNA , Young Adult
18.
Genet Med ; 15(1): 36-44, 2013 Jan.
Article in English | MEDLINE | ID: mdl-22995991

ABSTRACT

PURPOSE: Next-generation sequencing has transformed genetic research and is poised to revolutionize clinical diagnosis. However, the vast amount of data and inevitable discovery of incidental findings require novel analytic approaches. We therefore implemented for the first time a strategy that utilizes an a priori structured framework and a conservative threshold for selecting clinically relevant incidental findings. METHODS: We categorized 2,016 genes linked with Mendelian diseases into "bins" based on clinical utility and validity, and used a computational algorithm to analyze 80 whole-genome sequences in order to explore the use of such an approach in a simulated real-world setting. RESULTS: The algorithm effectively reduced the number of variants requiring human review and identified incidental variants with likely clinical relevance. Incorporation of the Human Gene Mutation Database improved the yield for missense mutations but also revealed that a substantial proportion of purported disease-causing mutations were misleading. CONCLUSION: This approach is adaptable to any clinically relevant bin structure, scalable to the demands of a clinical laboratory workflow, and flexible with respect to advances in genomics. We anticipate that application of this strategy will facilitate pretest informed consent, laboratory analysis, and posttest return of results in a clinical context.


Subject(s)
Genome-Wide Association Study/methods , Genomics/methods , Algorithms , Alleles , Databases, Genetic , Gene Frequency , Humans , Mutation
19.
J Clin Transl Sci ; 7(1): e214, 2023.
Article in English | MEDLINE | ID: mdl-37900350

ABSTRACT

Knowledge graphs have become a common approach for knowledge representation. Yet, the application of graph methodology is elusive due to the sheer number and complexity of knowledge sources. In addition, semantic incompatibilities hinder efforts to harmonize and integrate across these diverse sources. As part of The Biomedical Translator Consortium, we have developed a knowledge graph-based question-answering system designed to augment human reasoning and accelerate translational scientific discovery: the Translator system. We have applied the Translator system to answer biomedical questions in the context of a broad array of diseases and syndromes, including Fanconi anemia, primary ciliary dyskinesia, multiple sclerosis, and others. A variety of collaborative approaches have been used to research and develop the Translator system. One recent approach involved the establishment of a monthly "Question-of-the-Month (QotM) Challenge" series. Herein, we describe the structure of the QotM Challenge; the six challenges that have been conducted to date on drug-induced liver injury, cannabidiol toxicity, coronavirus infection, diabetes, psoriatic arthritis, and ATP1A3-related phenotypes; the scientific insights that have been gleaned during the challenges; and the technical issues that were identified over the course of the challenges and that can now be addressed to foster further development of the prototype Translator system. We close with a discussion on Large Language Models such as ChatGPT and highlight differences between those models and the Translator system.

20.
BMC Bioinformatics ; 13: 221, 2012 Sep 04.
Article in English | MEDLINE | ID: mdl-22946927

ABSTRACT

BACKGROUND: Next-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results. RESULTS: Here we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy. CONCLUSION: ReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration.


Subject(s)
High-Throughput Nucleotide Sequencing/standards , Software , Calibration , Genome , Logistic Models , Sequence Alignment
SELECTION OF CITATIONS
SEARCH DETAIL