Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
1.
Bioinformatics ; 40(1)2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38175789

RESUMO

SUMMARY: Knowledge graphs are being increasingly used in biomedical research to link large amounts of heterogenous data and facilitate reasoning across diverse knowledge sources. Wider adoption and exploration of knowledge graphs in the biomedical research community is limited by requirements to understand the underlying graph structure in terms of entity types and relationships, represented as nodes and edges, respectively, and learn specialized query languages for graph mining and exploration. We have developed a user-friendly interface dubbed ExEmPLAR (Extracting, Exploring, and Embedding Pathways Leading to Actionable Research) to aid reasoning over biomedical knowledge graphs and assist with data-driven research and hypothesis generation. We explain the key functionalities of ExEmPLAR and demonstrate its use with a case study considering the relationship of Trypanosoma cruzi, the etiological agent of Chagas disease, to frequently associated cardiovascular conditions. AVAILABILITY AND IMPLEMENTATION: ExEmPLAR is freely accessible at https://www.exemplar.mml.unc.edu/. For code and instructions for the using the application, see: https://github.com/beasleyjonm/AOP-COP-Path-Extractor.


Assuntos
Pesquisa Biomédica , Reconhecimento Automatizado de Padrão
2.
Bioinformatics ; 38(12): 3252-3258, 2022 06 13.
Artigo em Inglês | MEDLINE | ID: mdl-35441678

RESUMO

MOTIVATION: As the number of public data resources continues to proliferate, identifying relevant datasets across heterogenous repositories is becoming critical to answering scientific questions. To help researchers navigate this data landscape, we developed Dug: a semantic search tool for biomedical datasets utilizing evidence-based relationships from curated knowledge graphs to find relevant datasets and explain why those results are returned. RESULTS: Developed through the National Heart, Lung and Blood Institute's (NHLBI) BioData Catalyst ecosystem, Dug has indexed more than 15 911 study variables from public datasets. On a manually curated search dataset, Dug's total recall (total relevant results/total results) of 0.79 outperformed default Elasticsearch's total recall of 0.76. When using synonyms or related concepts as search queries, Dug (0.36) far outperformed Elasticsearch (0.14) in terms of total recall with no significant loss in the precision of its top results. AVAILABILITY AND IMPLEMENTATION: Dug is freely available at https://github.com/helxplatform/dug. An example Dug deployment is also available for use at https://search.biodatacatalyst.renci.org/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Ferramenta de Busca , Semântica , Ecossistema , Indexação e Redação de Resumos
3.
Bioinformatics ; 37(4): 586-587, 2021 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-33175089

RESUMO

SUMMARY: In response to the COVID-19 pandemic, we established COVID-KOP, a new knowledgebase integrating the existing Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways (ROBOKOP) biomedical knowledge graph with information from recent biomedical literature on COVID-19 annotated in the CORD-19 collection. COVID-KOP can be used effectively to generate new hypotheses concerning repurposing of known drugs and clinical drug candidates against COVID-19 by establishing respective confirmatory pathways of drug action. AVAILABILITY AND IMPLEMENTATION: COVID-KOP is freely accessible at https://covidkop.renci.org/. For code and instructions for the original ROBOKOP, see: https://github.com/NCATS-Gamma/robokop.


Assuntos
COVID-19 , Bases de Dados Factuais , Humanos , Bases de Conhecimento , Pandemias , SARS-CoV-2
4.
BMC Bioinformatics ; 22(1): 374, 2021 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-34284719

RESUMO

BACKGROUND: As exome sequencing (ES) integrates into clinical practice, we should make every effort to utilize all information generated. Copy-number variation can lead to Mendelian disorders, but small copy-number variants (CNVs) often get overlooked or obscured by under-powered data collection. Many groups have developed methodology for detecting CNVs from ES, but existing methods often perform poorly for small CNVs and rely on large numbers of samples not always available to clinical laboratories. Furthermore, methods often rely on Bayesian approaches requiring user-defined priors in the setting of insufficient prior knowledge. This report first demonstrates the benefit of multiplexed exome capture (pooling samples prior to capture), then presents a novel detection algorithm, mcCNV ("multiplexed capture CNV"), built around multiplexed capture. RESULTS: We demonstrate: (1) multiplexed capture reduces inter-sample variance; (2) our mcCNV method, a novel depth-based algorithm for detecting CNVs from multiplexed capture ES data, improves the detection of small CNVs. We contrast our novel approach, agnostic to prior information, with the the commonly-used ExomeDepth. In a simulation study mcCNV demonstrated a favorable false discovery rate (FDR). When compared to calls made from matched genome sequencing, we find the mcCNV algorithm performs comparably to ExomeDepth. CONCLUSION: Implementing multiplexed capture increases power to detect single-exon CNVs. The novel mcCNV algorithm may provide a more favorable FDR than ExomeDepth. The greatest benefits of our approach derive from (1) not requiring a database of reference samples and (2) not requiring prior information about the prevalance or size of variants.


Assuntos
Variações do Número de Cópias de DNA , Exoma , Algoritmos , Teorema de Bayes , Exoma/genética , Sequenciamento de Nucleotídeos em Larga Escala , Sequenciamento do Exoma
5.
J Chem Inf Model ; 61(12): 5734-5741, 2021 12 27.
Artigo em Inglês | MEDLINE | ID: mdl-34783553

RESUMO

The COVID-19 pandemic has catalyzed a widespread effort to identify drug candidates and biological targets of relevance to SARS-COV-2 infection, which resulted in large numbers of publications on this subject. We have built the COVID-19 Knowledge Extractor (COKE), a web application to extract, curate, and annotate essential drug-target relationships from the research literature on COVID-19. SciBiteAI ontological tagging of the COVID Open Research Data set (CORD-19), a repository of COVID-19 scientific publications, was employed to identify drug-target relationships. Entity identifiers were resolved through lookup routines using UniProt and DrugBank. A custom algorithm was used to identify co-occurrences of the target protein and drug terms, and confidence scores were calculated for each entity pair. COKE processing of the current CORD-19 database identified about 3000 drug-protein pairs, including 29 unique proteins and 500 investigational, experimental, and approved drugs. Some of these drugs are presently undergoing clinical trials for COVID-19. The COKE repository and web application can serve as a useful resource for drug repurposing against SARS-CoV-2. COKE is freely available at https://coke.mml.unc.edu/, and the code is available at https://github.com/DnlRKorn/CoKE.


Assuntos
COVID-19 , Preparações Farmacêuticas , Antivirais , Reposicionamento de Medicamentos , Humanos , Pandemias , SARS-CoV-2
6.
Bioinformatics ; 35(24): 5382-5384, 2019 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-31410449

RESUMO

SUMMARY: Knowledge graphs (KGs) are quickly becoming a common-place tool for storing relationships between entities from which higher-level reasoning can be conducted. KGs are typically stored in a graph-database format, and graph-database queries can be used to answer questions of interest that have been posed by users such as biomedical researchers. For simple queries, the inclusion of direct connections in the KG and the storage and analysis of query results are straightforward; however, for complex queries, these capabilities become exponentially more challenging with each increase in complexity of the query. For instance, one relatively complex query can yield a KG with hundreds of thousands of query results. Thus, the ability to efficiently query, store, rank and explore sub-graphs of a complex KG represents a major challenge to any effort designed to exploit the use of KGs for applications in biomedical research and other domains. We present Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways as an abstraction layer and user interface to more easily query KGs and store, rank and explore query results. AVAILABILITY AND IMPLEMENTATION: An instance of the ROBOKOP UI for exploration of the ROBOKOP Knowledge Graph can be found at http://robokop.renci.org. The ROBOKOP Knowledge Graph can be accessed at http://robokopkg.renci.org. Code and instructions for building and deploying ROBOKOP are available under the MIT open software license from https://github.com/NCATS-Gamma/robokop. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Reconhecimento Automatizado de Padrão , Software , Bases de Dados Factuais
7.
J Chem Inf Model ; 59(12): 4968-4973, 2019 12 23.
Artigo em Inglês | MEDLINE | ID: mdl-31769676

RESUMO

A proliferation of data sources has led to the notional existence of an implicit Knowledge Graph (KG) that contains vast amounts of biological knowledge contributed by distributed Application Programming Interfaces (APIs). However, challenges arise when integrating data across multiple APIs due to incompatible semantic types, identifier schemes, and data formats. We present ROBOKOP KG ( http://robokopkg.renci.org ), which is a KG that was initially built to support the open biomedical question-answering application, ROBOKOP (Reasoning Over Biomedical Objects linked in Knowledge-Oriented Pathways) ( http://robokop.renci.org ). Additionally, we present the ROBOKOP Knowledge Graph Builder (KGB), which constructs the KG and provides an extensible framework to handle graph query over and integration of federated data sources.


Assuntos
Gráficos por Computador , Mineração de Dados/métodos , Bases de Conhecimento , Bases de Dados Factuais , Interface Usuário-Computador
8.
Hum Mutat ; 39(11): 1690-1701, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30311374

RESUMO

Effective exchange of information about genetic variants is currently hampered by the lack of readily available globally unique variant identifiers that would enable aggregation of information from different sources. The ClinGen Allele Registry addresses this problem by providing (1) globally unique "canonical" variant identifiers (CAids) on demand, either individually or in large batches; (2) access to variant-identifying information in a searchable Registry; (3) links to allele-related records in many commonly used databases; and (4) services for adding links to information about registered variants in external sources. A core element of the Registry is a canonicalization service, implemented using in-memory sequence alignment-based index, which groups variant identifiers denoting the same nucleotide variant and assigns unique and dereferenceable CAids. More than 650 million distinct variants are currently registered, including those from gnomAD, ExAC, dbSNP, and ClinVar, including a small number of variants registered by Registry users. The Registry is accessible both via a web interface and programmatically via well-documented Hypertext Transfer Protocol (HTTP) Representational State Transfer Application Programming Interface (REST-APIs). For programmatic interoperability, the Registry content is accessible in the JavaScript Object Notation for Linked Data (JSON-LD) format. We present several use cases and demonstrate how the linked information may provide raw material for reasoning about variant's pathogenicity.


Assuntos
Bases de Dados Genéticas , Variação Genética/genética , Alelos , Humanos , Sistema de Registros , Software
9.
Hum Mutat ; 39(11): 1686-1689, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30311379

RESUMO

The Clinical Genome Resource (ClinGen)'s work to develop a knowledge base to support the understanding of genes and variants for use in precision medicine and research depends on robust, broadly applicable, and adaptable technical standards for sharing data and information. To forward this goal, ClinGen has joined with the Global Alliance for Genomics and Health (GA4GH) to support the development of open, freely-available technical standards and regulatory frameworks for secure and responsible sharing of genomic and health-related data. In its capacity as one of the 15 inaugural GA4GH "Driver Projects," ClinGen is providing input on the key standards needs of the global genomics community, and has committed to participate on GA4GH Work Streams to support the development of: (1) a standard model for computer-readable variant representation; (2) a data model for linking variant data to annotations; (3) a specification to enable sharing of genomic variant knowledge and associated clinical interpretations; and (4) a set of best practices for use of phenotype and disease ontologies. ClinGen's participation as a GA4GH Driver Project will provide a robust environment to test drive emerging genomic knowledge sharing standards and prove their utility among the community, while accelerating the construction of the ClinGen evidence base.


Assuntos
Genoma Humano/genética , Disseminação de Informação/métodos , Biologia Computacional , Bases de Dados Genéticas , Variação Genética , Genômica , Humanos , Medicina de Precisão
10.
Addict Biol ; 23(1): 461-473, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-28111843

RESUMO

Recent advances in genome wide sequencing techniques and analytical methods allow for more comprehensive examinations of the genome than microarray-based genome-wide association studies (GWAS). The present report provides the first application of whole genome sequencing (WGS) to identify low frequency variants involved in cannabis dependence across two independent cohorts. The present study used low-coverage whole genome sequence data to conduct set-based association and enrichment analyses of low frequency variation in protein-coding regions as well as regulatory regions in relation to cannabis dependence. Two cohorts were studied: a population-based Native American tribal community consisting of 697 participants nested within large multi-generational pedigrees and a family-based sample of 1832 predominantly European ancestry participants largely nested within nuclear families. Participants in both samples were assessed for Diagnostic and Statistical Manual of Mental Disorders-IV (DSM-IV) lifetime cannabis dependence, with 168 and 241 participants receiving a positive diagnosis in each sample, respectively. Sequence kernel association tests identified one protein-coding region, C1orf110 and one regulatory region in the MEF2B gene that achieved significance in a meta-analysis of both samples. A regulatory region within the PCCB gene, a gene previously associated with schizophrenia, exhibited a suggestive association. Finally, a significant enrichment of regions within or near genes with multiple splice variants or involved in cell adhesion or potassium channel activity were associated with cannabis dependence. This initial study demonstrates the potential utility of low pass whole genome sequencing for identifying genetic variants involved in the etiology of cannabis use disorders.


Assuntos
Indígenas Norte-Americanos/genética , Abuso de Maconha/genética , População Branca/genética , Adulto , Estudos de Coortes , Feminino , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Fatores de Transcrição MEF2/genética , Masculino , Metilmalonil-CoA Descarboxilase/genética , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Canais de Potássio/genética , Sequenciamento Completo do Genoma
11.
Am J Hum Genet ; 94(2): 233-45, 2014 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-24507775

RESUMO

Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments.


Assuntos
LDL-Colesterol/genética , Exoma , Frequência do Gene , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Adulto , Idoso , Apolipoproteínas E/sangue , Apolipoproteínas E/genética , Estudos de Coortes , Dislipidemias/sangue , Dislipidemias/genética , Feminino , Seguimentos , Código Genético , Genótipo , Humanos , Lipase/genética , Masculino , Pessoa de Meia-Idade , Fenótipo , Pró-Proteína Convertase 9 , Pró-Proteína Convertases/genética , Receptores de LDL/genética , Análise de Sequência de DNA , Serina Endopeptidases/genética
12.
Genet Med ; 19(11): 1207-1216, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-28518170

RESUMO

PurposeWe investigated the diagnostic and clinical performance of exome sequencing in fetuses with sonographic abnormalities with normal karyotype and microarray and, in some cases, normal gene-specific sequencing.MethodsExome sequencing was performed on DNA from 15 anomalous fetuses and from the peripheral blood of their parents. Parents provided consent to be informed of diagnostic results in the fetus, medically actionable findings in the parents, and their identification as carrier couples for significant autosomal recessive conditions. We assessed the perceptions and understanding of exome sequencing using mixed methods in 15 mother-father dyads.ResultsIn seven (47%) of 15 fetuses, exome sequencing provided a diagnosis or possible diagnosis with identification of variants in the following genes: COL1A1, MUSK, KCTD1, RTTN, TMEM67, PIEZO1 and DYNC2H1. One additional case revealed a de novo nonsense mutation in a novel candidate gene (MAP4K4). The perceived likelihood that exome sequencing would explain the results (5.2 on a 10-point scale) was higher than the approximately 30% diagnostic yield discussed in pretest counseling.ConclusionExome sequencing had diagnostic utility in a highly select population of fetuses where a genetic diagnosis was highly suspected. Challenges related to genetics literacy and variant interpretation must be addressed by highly tailored pre- and posttest genetic counseling.


Assuntos
Exoma , Doenças Fetais/diagnóstico , Doenças Fetais/genética , Diagnóstico Pré-Natal/métodos , Análise de Sequência de DNA , Adulto , Pai , Feminino , Desenvolvimento Fetal/genética , Doenças Fetais/diagnóstico por imagem , Feto , Humanos , Cariótipo , Masculino , Mães , Gravidez , Complicações na Gravidez , Estudos Prospectivos , Análise Serial de Proteínas , Estudos Retrospectivos , Fatores Socioeconômicos , Ultrassonografia Pré-Natal
13.
Am J Med Genet B Neuropsychiatr Genet ; 174(5): 557-567, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28440896

RESUMO

Nicotine dependence (ND) has a reported heritability of 40-70%. Low-coverage whole-genome sequencing was conducted in 1,889 samples from the UCSF Family study. Linear mixed models were used to conduct genome-wide association (GWA) tests of ND in this and five cohorts obtained from the database of Genotypes and Phenotypes. Fixed-effect meta-analysis was carried out separately for European (n = 14,713) and African (n = 3,369) participants, and then in a combined analysis of both ancestral groups. The meta-analysis of African participants identified a significant and novel susceptibility signal (rs56247223; p = 4.11 × 10-8 ). Data from the Genotype-Tissue Expression (GTEx) study suggested the protective allele is associated with reduced mRNA expression of CACNA2D3 in three human brain tissues (p < 4.94 × 10-2 ). Sequence data from the UCSF Family study suggested that a rare nonsynonymous variant in this gene conferred increased risk for ND (p = 0.01) providing further support for CACNA2D3 involvement in ND. Suggestive associations were observed in six additional regions in both European and merged populations (p < 5.00 × 10-6 ). The top variants were found to regulate mRNA expression levels of genes in human brains using GTEx data (p < 0.05): HAX1 and CHRNB2 (rs1760803), ADAMTSL1 (rs17198023), PEX2 (rs12680810), GLIS3 (rs12348139), non-coding RNA for LINC00476 (rs10759883), and GABBR1 (rs56020557 and rs62392942). A gene-based association test further supported the relation between GABBR1 and ND (p = 6.36 × 10-7 ). These findings will inform the biological mechanisms and development of therapeutic targets for ND.

14.
N Engl J Med ; 366(2): 141-9, 2012 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-22236224

RESUMO

BACKGROUND: Family history is a significant risk factor for prostate cancer, although the molecular basis for this association is poorly understood. Linkage studies have implicated chromosome 17q21-22 as a possible location of a prostate-cancer susceptibility gene. METHODS: We screened more than 200 genes in the 17q21-22 region by sequencing germline DNA from 94 unrelated patients with prostate cancer from families selected for linkage to the candidate region. We tested family members, additional case subjects, and control subjects to characterize the frequency of the identified mutations. RESULTS: Probands from four families were discovered to have a rare but recurrent mutation (G84E) in HOXB13 (rs138213197), a homeobox transcription factor gene that is important in prostate development. All 18 men with prostate cancer and available DNA in these four families carried the mutation. The carrier rate of the G84E mutation was increased by a factor of approximately 20 in 5083 unrelated subjects of European descent who had prostate cancer, with the mutation found in 72 subjects (1.4%), as compared with 1 in 1401 control subjects (0.1%) (P=8.5x10(-7)). The mutation was significantly more common in men with early-onset, familial prostate cancer (3.1%) than in those with late-onset, nonfamilial prostate cancer (0.6%) (P=2.0x10(-6)). CONCLUSIONS: The novel HOXB13 G84E variant is associated with a significantly increased risk of hereditary prostate cancer. Although the variant accounts for a small fraction of all prostate cancers, this finding has implications for prostate-cancer risk assessment and may provide new mechanistic insights into this common cancer. (Funded by the National Institutes of Health and others.).


Assuntos
Mutação em Linhagem Germinativa , Proteínas de Homeodomínio/genética , Neoplasias da Próstata/genética , Cromossomos Humanos Par 17 , Ligação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Pessoa de Meia-Idade , Linhagem , Próstata/patologia , Neoplasias da Próstata/patologia , Análise de Sequência de DNA
15.
BMC Genomics ; 15: 85, 2014 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-24479562

RESUMO

BACKGROUND: The reduction in the cost of sequencing a human genome has led to the use of genotype sampling strategies in order to impute and infer the presence of sequence variants that can then be tested for associations with traits of interest. Low-coverage Whole Genome Sequencing (WGS) is a sampling strategy that overcomes some of the deficiencies seen in fixed content SNP array studies. Linkage-disequilibrium (LD) aware variant callers, such as the program Thunder, may provide a calling rate and accuracy that makes a low-coverage sequencing strategy viable. RESULTS: We examined the performance of an LD-aware variant calling strategy in a population of 708 low-coverage whole genome sequences from a community sample of Native Americans. We assessed variant calling through a comparison of the sequencing results to genotypes measured in 641 of the same subjects using a fixed content first generation exome array. The comparison was made using the variant calling routines GATK Unified Genotyper program and the LD-aware variant caller Thunder. Thunder was found to improve concordance in a coverage dependent fashion, while correctly calling nearly all of the common variants as well as a high percentage of the rare variants present in the sample. CONCLUSIONS: Low-coverage WGS is a strategy that appears to collect genetic information intermediate in scope between fixed content genotyping arrays and deep-coverage WGS. Our data suggests that low-coverage WGS is a viable strategy with a greater chance of discovering novel variants and associations than fixed content arrays for large sample association analyses.


Assuntos
Genoma Humano , Indígenas Norte-Americanos/genética , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Estudos de Coortes , Exoma , Frequência do Gene , Variação Genética , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Desequilíbrio de Ligação , Pessoa de Meia-Idade , Análise de Sequência com Séries de Oligonucleotídeos , Polimorfismo de Nucleotídeo Único , Software , Adulto Jovem
16.
Bioinformatics ; 29(21): 2744-9, 2013 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-23956302

RESUMO

SUMMARY: Although the 1000 Genomes haplotypes are the most commonly used reference panel for imputation, medical sequencing projects are generating large alternate sets of sequenced samples. Imputation in African Americans using 3384 haplotypes from the Exome Sequencing Project, compared with 2184 haplotypes from 1000 Genomes Project, increased effective sample size by 8.3-11.4% for coding variants with minor allele frequency <1%. No loss of imputation quality was observed using a panel built from phenotypic extremes. We recommend using haplotypes from Exome Sequencing Project alone or concatenation of the two panels over quality score-based post-imputation selection or IMPUTE2's two-panel combination. CONTACT: yunli@med.unc.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Negro ou Afro-Americano/genética , Exoma , Variação Genética , Análise de Sequência de DNA/métodos , Frequência do Gene , Genoma Humano , Estudo de Associação Genômica Ampla , Haplótipos , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único
17.
Am J Med Genet B Neuropsychiatr Genet ; 165B(8): 673-83, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25270064

RESUMO

Higher rates of alcohol use and other drug-dependence have been observed in some Native American (NA) populations relative to other ethnic groups in the US. Previous studies have shown that alcohol dehydrogenase (ADH) genes and aldehyde dehydrogenase (ALDH) genes may affect the risk of development of alcohol dependence, and that polymorphisms within these genes may differentially affect risk for the disorder depending on the ethnic group evaluated. We evaluated variations in the ADH and ALDH genes in a large study investigating risk factors for substance use in a NA population. We assessed ancestry admixture and tested for associations between alcohol-related phenotypes in the genomic regions around the ADH1-7 and ALDH2 and ALDH1A1 genes. Seventy-two ADH variants showed significant evidence of association with a severity level of alcohol drinking-related dependence symptoms phenotype. These significant variants spanned across the entire 7 ADH gene cluster regions. Two significant associations, one in ADH and one in ALDH2, were observed with alcohol dependence diagnosis. Seventeen variants showed significant association with the largest number of alcohol drinks ingested during any 24-hour period. Variants in or near ADH7 were significantly negatively associated with alcohol-related phenotypes, suggesting a potential protective effect of this gene. In addition, our results suggested that a higher degree of NA ancestry is associated with higher frequencies of potential risk variants and lower frequencies of potential protective variants for alcohol dependence phenotypes.


Assuntos
Álcool Desidrogenase/genética , Alcoolismo/genética , Aldeído Desidrogenase/genética , Variação Genética/genética , Indígenas Norte-Americanos/genética , Polimorfismo Genético/genética , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , Análise de Sequência de DNA , Adulto Jovem
18.
Genet Med ; 15(1): 36-44, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22995991

RESUMO

PURPOSE: Next-generation sequencing has transformed genetic research and is poised to revolutionize clinical diagnosis. However, the vast amount of data and inevitable discovery of incidental findings require novel analytic approaches. We therefore implemented for the first time a strategy that utilizes an a priori structured framework and a conservative threshold for selecting clinically relevant incidental findings. METHODS: We categorized 2,016 genes linked with Mendelian diseases into "bins" based on clinical utility and validity, and used a computational algorithm to analyze 80 whole-genome sequences in order to explore the use of such an approach in a simulated real-world setting. RESULTS: The algorithm effectively reduced the number of variants requiring human review and identified incidental variants with likely clinical relevance. Incorporation of the Human Gene Mutation Database improved the yield for missense mutations but also revealed that a substantial proportion of purported disease-causing mutations were misleading. CONCLUSION: This approach is adaptable to any clinically relevant bin structure, scalable to the demands of a clinical laboratory workflow, and flexible with respect to advances in genomics. We anticipate that application of this strategy will facilitate pretest informed consent, laboratory analysis, and posttest return of results in a clinical context.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Algoritmos , Alelos , Bases de Dados Genéticas , Frequência do Gene , Humanos , Mutação
19.
J Clin Transl Sci ; 7(1): e214, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37900350

RESUMO

Knowledge graphs have become a common approach for knowledge representation. Yet, the application of graph methodology is elusive due to the sheer number and complexity of knowledge sources. In addition, semantic incompatibilities hinder efforts to harmonize and integrate across these diverse sources. As part of The Biomedical Translator Consortium, we have developed a knowledge graph-based question-answering system designed to augment human reasoning and accelerate translational scientific discovery: the Translator system. We have applied the Translator system to answer biomedical questions in the context of a broad array of diseases and syndromes, including Fanconi anemia, primary ciliary dyskinesia, multiple sclerosis, and others. A variety of collaborative approaches have been used to research and develop the Translator system. One recent approach involved the establishment of a monthly "Question-of-the-Month (QotM) Challenge" series. Herein, we describe the structure of the QotM Challenge; the six challenges that have been conducted to date on drug-induced liver injury, cannabidiol toxicity, coronavirus infection, diabetes, psoriatic arthritis, and ATP1A3-related phenotypes; the scientific insights that have been gleaned during the challenges; and the technical issues that were identified over the course of the challenges and that can now be addressed to foster further development of the prototype Translator system. We close with a discussion on Large Language Models such as ChatGPT and highlight differences between those models and the Translator system.

20.
BMC Bioinformatics ; 13: 221, 2012 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-22946927

RESUMO

BACKGROUND: Next-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results. RESULTS: Here we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy. CONCLUSION: ReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/normas , Software , Calibragem , Genoma , Modelos Logísticos , Alinhamento de Sequência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA