Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
Sci Rep ; 12(1): 20167, 2022 11 23.
Artículo en Inglés | MEDLINE | ID: mdl-36424512

RESUMEN

To create a scientific resource of expression quantitative trail loci (eQTL), we conducted a genome-wide association study (GWAS) using genotypes obtained from whole genome sequencing (WGS) of DNA and gene expression levels from RNA sequencing (RNA-seq) of whole blood in 2622 participants in Framingham Heart Study. We identified 6,778,286 cis-eQTL variant-gene transcript (eGene) pairs at p < 5 × 10-8 (2,855,111 unique cis-eQTL variants and 15,982 unique eGenes) and 1,469,754 trans-eQTL variant-eGene pairs at p < 1e-12 (526,056 unique trans-eQTL variants and 7233 unique eGenes). In addition, 442,379 cis-eQTL variants were associated with expression of 1518 long non-protein coding RNAs (lncRNAs). Gene Ontology (GO) analyses revealed that the top GO terms for cis-eGenes are enriched for immune functions (FDR < 0.05). The cis-eQTL variants are enriched for SNPs reported to be associated with 815 traits in prior GWAS, including cardiovascular disease risk factors. As proof of concept, we used this eQTL resource in conjunction with genetic variants from public GWAS databases in causal inference testing (e.g., COVID-19 severity). After Bonferroni correction, Mendelian randomization analyses identified putative causal associations of 60 eGenes with systolic blood pressure, 13 genes with coronary artery disease, and seven genes with COVID-19 severity. This study created a comprehensive eQTL resource via BioData Catalyst that will be made available to the scientific community. This will advance understanding of the genetic architecture of gene expression underlying a wide range of diseases.


Asunto(s)
Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo , Humanos , ADN , Expresión Génica , Sitios de Carácter Cuantitativo/genética , Análisis de Secuencia de ARN
2.
Sci Data ; 9(1): 532, 2022 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-36050327

RESUMEN

Identifying relevant studies and harmonizing datasets are major hurdles for data reuse. Common Data Elements (CDEs) can help identify comparable study datasets and reduce the burden of retrospective data harmonization, but they have not been required, historically. The collaborative team at PhenX and dbGaP developed an approach to use PhenX variables as a set of CDEs to link phenotypic data and identify comparable studies in dbGaP. Variables were identified as either comparable or related, based on the data collection mode used to harmonize data across mapped datasets. We further added a CDE data field in the dbGaP data submission packet to indicate use of PhenX and annotate linkages in the future. Some 13,653 dbGaP variables from 521 studies were linked through PhenX variable mapping. These variable linkages have been made accessible for browsing and searching in the repository through dbGaP CDE-faceted search filter and the PhenX variable search tool. New features in dbGaP and PhenX enable investigators to identify variable linkages among dbGaP studies and reveal opportunities for cross-study analysis.


Asunto(s)
Recolección de Datos , Conjuntos de Datos como Asunto , Estudios Retrospectivos
3.
Res Sq ; 2022 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-35664994

RESUMEN

To create a scientific resource of expression quantitative trail loci (eQTL), we conducted a genome-wide association study (GWAS) using genotypes obtained from whole genome sequencing (WGS) of DNA and gene expression levels from RNA sequencing (RNA-seq) of whole blood in 2622 participants in Framingham Heart Study. We identified 6,778,286 cis -eQTL variant-gene transcript (eGene) pairs at p < 5x10 - 8 (2,855,111 unique cis -eQTL variants and 15,982 unique eGenes) and 1,469,754 trans -eQTL variant-eGene pairs at p < 1e-12 (526,056 unique trans -eQTL variants and 7,233 unique eGenes). In addition, 442,379 cis -eQTL variants were associated with expression of 1518 long non-protein coding RNAs (lncRNAs). Gene Ontology (GO) analyses revealed that the top GO terms for cis- eGenes are enriched for immune functions (FDR < 0.05). The cis -eQTL variants are enriched for SNPs reported to be associated with 815 traits in prior GWAS, including cardiovascular disease risk factors. As proof of concept, we used this eQTL resource in conjunction with genetic variants from public GWAS databases in causal inference testing (e.g., COVID-19 severity). After Bonferroni correction, Mendelian randomization analyses identified putative causal associations of 60 eGenes with systolic blood pressure, 13 genes with coronary artery disease, and seven genes with COVID-19 severity. This study created a comprehensive eQTL resource via BioData Catalyst that will be made available to the scientific community. This will advance understanding of the genetic architecture of gene expression underlying a wide range of diseases.

4.
medRxiv ; 2022 May 03.
Artículo en Inglés | MEDLINE | ID: mdl-35547845

RESUMEN

To create a scientific resource of expression quantitative trail loci (eQTL), we conducted a genome-wide association study (GWAS) using genotypes obtained from whole genome sequencing (WGS) of DNA and gene expression levels from RNA sequencing (RNA-seq) of whole blood in 2622 participants in Framingham Heart Study. We identified 6,778,286 cis -eQTL variant-gene transcript (eGene) pairs at p <5×10 -8 (2,855,111 unique cis -eQTL variants and 15,982 unique eGenes) and 1,469,754 trans -eQTL variant-eGene pairs at p <1e-12 (526,056 unique trans -eQTL variants and 7,233 unique eGenes). In addition, 442,379 cis -eQTL variants were associated with expression of 1518 long non-protein coding RNAs (lncRNAs). Gene Ontology (GO) analyses revealed that the top GO terms for cis- eGenes are enriched for immune functions (FDR <0.05). The cis -eQTL variants are enriched for SNPs reported to be associated with 815 traits in prior GWAS, including cardiovascular disease risk factors. As proof of concept, we used this eQTL resource in conjunction with genetic variants from public GWAS databases in causal inference testing (e.g., COVID-19 severity). After Bonferroni correction, Mendelian randomization analyses identified putative causal associations of 60 eGenes with systolic blood pressure, 13 genes with coronary artery disease, and seven genes with COVID-19 severity. This study created a comprehensive eQTL resource via BioData Catalyst that will be made available to the scientific community. This will advance understanding of the genetic architecture of gene expression underlying a wide range of diseases.

5.
G3 (Bethesda) ; 9(8): 2447-2461, 2019 08 08.
Artículo en Inglés | MEDLINE | ID: mdl-31151998

RESUMEN

Inferring subject ancestry using genetic data is an important step in genetic association studies, required for dealing with population stratification. It has become more challenging to infer subject ancestry quickly and accurately since large amounts of genotype data, collected from millions of subjects by thousands of studies using different methods, are accessible to researchers from repositories such as the database of Genotypes and Phenotypes (dbGaP) at the National Center for Biotechnology Information (NCBI). Study-reported populations submitted to dbGaP are often not harmonized across studies or may be missing. Widely-used methods for ancestry prediction assume that most markers are genotyped in all subjects, but this assumption is unrealistic if one wants to combine studies that used different genotyping platforms. To provide ancestry inference and visualization across studies, we developed a new method, GRAF-pop, of ancestry prediction that is robust to missing genotypes and allows researchers to visualize predicted population structure in color and in three dimensions. When genotypes are dense, GRAF-pop is comparable in quality and running time to existing ancestry inference methods EIGENSTRAT, FastPCA, and FlashPCA2, all of which rely on principal components analysis (PCA). When genotypes are not dense, GRAF-pop gives much better ancestry predictions than the PCA-based methods. GRAF-pop employs basic geometric and probabilistic methods; the visualized ancestry predictions have a natural geometric interpretation, which is lacking in PCA-based methods. Since February 2018, GRAF-pop has been successfully incorporated into the dbGaP quality control process to identify inconsistencies between study-reported and computationally predicted populations and to provide harmonized population values in all new dbGaP submissions amenable to population prediction, based on marker genotypes. Plots, produced by GRAF-pop, of summary population predictions are available on dbGaP study pages, and the software, is available at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/Software.cgi.


Asunto(s)
Bases de Datos Genéticas , Estudios de Asociación Genética/métodos , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Genética de Población , Estudio de Asociación del Genoma Completo , Humanos , Análisis de Componente Principal , Reproducibilidad de los Resultados
6.
PLoS One ; 12(6): e0179106, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28609482

RESUMEN

Genome-wide association studies (GWAS) usually rely on the assumption that different samples are not from closely related individuals. Detection of duplicates and close relatives becomes more difficult both statistically and computationally when one wants to combine datasets that may have been genotyped on different platforms. The dbGaP repository at the National Center of Biotechnology Information (NCBI) contains datasets from hundreds of studies with over one million samples. There are many duplicates and closely related individuals both within and across studies from different submitters. Relationships between studies cannot always be identified by the submitters of individual datasets. To aid in curation of dbGaP, we developed a rapid statistical method called Genetic Relationship and Fingerprinting (GRAF) to detect duplicates and closely related samples, even when the sets of genotyped markers differ and the DNA strand orientations are unknown. GRAF extracts genotypes of 10,000 informative and independent SNPs from genotype datasets obtained using different methods, and implements quick algorithms that enable it to find all of the duplicate pairs from more than 880,000 samples within and across dbGaP studies in less than two hours. In addition, GRAF uses two statistical metrics called All Genotype Mismatch Rate (AGMR) and Homozygous Genotype Mismatch Rate (HGMR) to determine subject relationships directly from the observed genotypes, without estimating probabilities of identity by descent (IBD), or kinship coefficients, and compares the predicted relationships with those reported in the pedigree files. We implemented GRAF in a freely available C++ program of the same name. In this paper, we describe the methods in GRAF and validate the usage of GRAF on samples from the dbGaP repository. Other scientists can use GRAF on their own samples and in combination with samples downloaded from dbGaP.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Minería de Datos/métodos , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Polimorfismo de Nucleótido Simple , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Humanos , Reproducibilidad de los Resultados
7.
Genome Biol ; 18(1): 16, 2017 01 25.
Artículo en Inglés | MEDLINE | ID: mdl-28122634

RESUMEN

BACKGROUND: Identification of single nucleotide polymorphisms (SNPs) associated with gene expression levels, known as expression quantitative trait loci (eQTLs), may improve understanding of the functional role of phenotype-associated SNPs in genome-wide association studies (GWAS). The small sample sizes of some previous eQTL studies have limited their statistical power. We conducted an eQTL investigation of microarray-based gene and exon expression levels in whole blood in a cohort of 5257 individuals, exceeding the single cohort size of previous studies by more than a factor of 2. RESULTS: We detected over 19,000 independent lead cis-eQTLs and over 6000 independent lead trans-eQTLs, targeting over 10,000 gene targets (eGenes), with a false discovery rate (FDR) < 5%. Of previously published significant GWAS SNPs, 48% are identified to be significant eQTLs in our study. Some trans-eQTLs point toward novel mechanistic explanations for the association of the SNP with the GWAS-related phenotype. We also identify 59 distinct blocks or clusters of trans-eQTLs, each targeting the expression of sets of six to 229 distinct trans-eGenes. Ten of these sets of target genes are significantly enriched for microRNA targets (FDR < 5%). Many of these clusters are associated in GWAS with multiple phenotypes. CONCLUSIONS: These findings provide insights into the molecular regulatory patterns involved in human physiology and pathophysiology. We illustrate the value of our eQTL database in the context of a recent GWAS meta-analysis of coronary artery disease and provide a list of targeted eGenes for 21 of 58 GWAS loci.


Asunto(s)
Expresión Génica , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Genómica , Sitios de Carácter Cuantitativo , Adulto , Anciano , Alelos , Análisis por Conglomerados , Femenino , Perfilación de la Expresión Génica , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Humanos , Masculino , MicroARNs/genética , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Secuencias Reguladoras de Ácidos Nucleicos , Reproducibilidad de los Resultados , Navegador Web
9.
PLoS One ; 9(7): e97282, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24988075

RESUMEN

The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation by sequencing at a level that should allow the genome-wide detection of most variants with frequencies as low as 1%. However, in the major histocompatibility complex (MHC), only the top 10 most frequent haplotypes are in the 1% frequency range whereas thousands of haplotypes are present at lower frequencies. Given the limitation of both the coverage and the read length of the sequences generated by the 1000 Genomes Project, the highly variable positions that define HLA alleles may be difficult to identify. We used classical Sanger sequencing techniques to type the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 genes in the available 1000 Genomes samples and combined the results with the 103,310 variants in the MHC region genotyped by the 1000 Genomes Project. Using pairwise identity-by-descent distances between individuals and principal component analysis, we established the relationship between ancestry and genetic diversity in the MHC region. As expected, both the MHC variants and the HLA phenotype can identify the major ancestry lineage, informed mainly by the most frequent HLA haplotypes. To some extent, regions of the genome with similar genetic or similar recombination rate have similar properties. An MHC-centric analysis underlines departures between the ancestral background of the MHC and the genome-wide picture. Our analysis of linkage disequilibrium (LD) decay in these samples suggests that overestimation of pairwise LD occurs due to a limited sampling of the MHC diversity. This collection of HLA-specific MHC variants, available on the dbMHC portal, is a valuable resource for future analyses of the role of MHC in population and disease studies.


Asunto(s)
Variación Genética , Genoma Humano , Antígenos HLA/genética , Alelos , Bases de Datos Genéticas , Genotipo , Haplotipos , Prueba de Histocompatibilidad , Proyecto Genoma Humano , Humanos , Desequilibrio de Ligamiento , Complejo Mayor de Histocompatibilidad/genética , Polimorfismo de Nucleótido Simple , Análisis de Componente Principal
10.
Nucleic Acids Res ; 42(Database issue): D975-9, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24297256

RESUMEN

The Database of Genotypes and Phenotypes (dbGap, http://www.ncbi.nlm.nih.gov/gap) is a National Institutes of Health-sponsored repository charged to archive, curate and distribute information produced by studies investigating the interaction of genotype and phenotype. Information in dbGaP is organized as a hierarchical structure and includes the accessioned objects, phenotypes (as variables and datasets), various molecular assay data (SNP and Expression Array data, Sequence and Epigenomic marks), analyses and documents. Publicly accessible metadata about submitted studies, summary level data, and documents related to studies can be accessed freely on the dbGaP website. Individual-level data are accessible via Controlled Access application to scientists across the globe.


Asunto(s)
Bases de Datos Genéticas , Genotipo , Fenotipo , Humanos , Internet , National Library of Medicine (U.S.) , Estados Unidos
11.
Nucleic Acids Res ; 40(Database issue): D13-25, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22140104

RESUMEN

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Asunto(s)
Bases de Datos como Asunto , Bases de Datos Genéticas , Bases de Datos de Proteínas , Expresión Génica , Genómica , Internet , Modelos Moleculares , National Library of Medicine (U.S.) , Publicaciones Periódicas como Asunto , PubMed , Alineación de Secuencia , Análisis de Secuencia de ADN , Análisis de Secuencia de Proteína , Análisis de Secuencia de ARN , Bibliotecas de Moléculas Pequeñas , Estados Unidos
12.
Nat Rev Genet ; 12(10): 730-6, 2011 09 16.
Artículo en Inglés | MEDLINE | ID: mdl-21921928

RESUMEN

Access to genetic data across studies is an important aspect of identifying new genetic associations through genome-wide association studies (GWASs). Meta-analysis across multiple GWASs with combined cohort sizes of tens of thousands of individuals often uncovers many more genome-wide associated loci than the original individual studies; this emphasizes the importance of tools and mechanisms for data sharing. However, even sharing summary-level data, such as allele frequencies, inherently carries some degree of privacy risk to study participants. Here we discuss mechanisms and resources for sharing data from GWASs, particularly focusing on approaches for assessing and quantifying the privacy risks to participants that result from the sharing of summary-level data.


Asunto(s)
Recolección de Datos , Variación Genética , Estudio de Asociación del Genoma Completo , Difusión de la Información/métodos , Estudios de Cohortes , Confidencialidad , Recolección de Datos/legislación & jurisprudencia , Bases de Datos Genéticas , Variación Genética/fisiología , Estudio de Asociación del Genoma Completo/métodos , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Difusión de la Información/legislación & jurisprudencia , Metaanálisis como Asunto , Polimorfismo de Nucleótido Simple , Medición de Riesgo
13.
Nucleic Acids Res ; 39(Database issue): D38-51, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21097890

RESUMEN

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), IBIS, Biosystems, Peptidome, OMSSA, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Asunto(s)
Bases de Datos Genéticas , Bases de Datos de Proteínas , Expresión Génica , Genómica , National Library of Medicine (U.S.) , Estructura Terciaria de Proteína , PubMed , Alineación de Secuencia , Análisis de Secuencia de ADN , Análisis de Secuencia de ARN , Programas Informáticos , Integración de Sistemas , Estados Unidos
14.
Nucleic Acids Res ; 38(Database issue): D5-16, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19910364

RESUMEN

In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, Reference Sequence, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Peptidome, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Bases de Datos de Ácidos Nucleicos , Algoritmos , Animales , Biología Computacional/tendencias , Bases de Datos de Proteínas , Genoma Bacteriano , Genoma Viral , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , National Institutes of Health (U.S.) , National Library of Medicine (U.S.) , Programas Informáticos , Estados Unidos
15.
Hum Mol Genet ; 19(4): 707-19, 2010 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-19933168

RESUMEN

We describe a novel approach to genetic association analyses with proteins sub-divided into biologically relevant smaller sequence features (SFs), and their variant types (VTs). SFVT analyses are particularly informative for study of highly polymorphic proteins such as the human leukocyte antigen (HLA), given the nature of its genetic variation: the high level of polymorphism, the pattern of amino acid variability, and that most HLA variation occurs at functionally important sites, as well as its known role in organ transplant rejection, autoimmune disease development and response to infection. Further, combinations of variable amino acid sites shared by several HLA alleles (shared epitopes) are most likely better descriptors of the actual causative genetic variants. In a cohort of systemic sclerosis patients/controls, SFVT analysis shows that a combination of SFs implicating specific amino acid residues in peptide binding pockets 4 and 7 of HLA-DRB1 explains much of the molecular determinant of risk.


Asunto(s)
Variación Genética , Antígenos HLA/genética , Esclerodermia Sistémica/genética , Antígenos HLA/química , Antígenos HLA-DR/química , Antígenos HLA-DR/genética , Cadenas HLA-DRB1 , Humanos , Conformación Molecular
16.
PLoS One ; 4(4): e5225, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19381300

RESUMEN

Commercial SNP microarrays now provide comprehensive and affordable coverage of the human genome. However, some diseases have biologically relevant genomic regions that may require additional coverage. Addiction, for example, is thought to be influenced by complex interactions among many relevant genes and pathways. We have assembled a list of 486 biologically relevant genes nominated by a panel of experts on addiction. We then added 424 genes that showed evidence of association with addiction phenotypes through mouse QTL mappings and gene co-expression analysis. We demonstrate that there are a substantial number of SNPs in these genes that are not well represented by commercial SNP platforms. We address this problem by introducing a publicly available SNP database for addiction. The database is annotated using numeric prioritization scores indicating the extent of biological relevance. The scores incorporate a number of factors such as SNP/gene functional properties (including synonymy and promoter regions), data from mouse systems genetics and measures of human/mouse evolutionary conservation. We then used HapMap genotyping data to determine if a SNP is tagged by a commercial microarray through linkage disequilibrium. This combination of biological prioritization scores and LD tagging annotation will enable addiction researchers to supplement commercial SNP microarrays to ensure comprehensive coverage of biologically relevant regions.


Asunto(s)
Conducta Adictiva/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Polimorfismo de Nucleótido Simple , Animales , Humanos , Ratones , Sitios de Carácter Cuantitativo
17.
Nucleic Acids Res ; 37(Database issue): D5-15, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18940862

RESUMEN

In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the web applications is custom implementation of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Asunto(s)
Bases de Datos Genéticas , Expresión Génica , Genes , Genómica , Genotipo , National Library of Medicine (U.S.) , Fenotipo , Estructura Terciaria de Proteína , Proteómica , PubMed , Homología de Secuencia , Integración de Sistemas , Estados Unidos
18.
Nucleic Acids Res ; 36(Database issue): D13-21, 2008 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18045790

RESUMEN

In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data available through NCBI's web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace, Assembly, and Short Read Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Database of Genotype and Phenotype, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting the web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Asunto(s)
Bases de Datos Genéticas , National Library of Medicine (U.S.) , Animales , Bases de Datos de Ácidos Nucleicos , Expresión Génica , Genómica , Genotipo , Humanos , Internet , Modelos Moleculares , Fenotipo , Proteómica , Alineación de Secuencia , Estados Unidos
19.
Nat Genet ; 39(10): 1181-6, 2007 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-17898773

RESUMEN

The National Center for Biotechnology Information has created the dbGaP public repository for individual-level phenotype, exposure, genotype and sequence data and the associations between them. dbGaP assigns stable, unique identifiers to studies and subsets of information from those studies, including documents, individual phenotypic variables, tables of trait data, sets of genotype data, computed phenotype-genotype associations, and groups of study subjects who have given similar consents for use of their data.


Asunto(s)
Bases de Datos Genéticas , Genotipo , Fenotipo , Biología Computacional , Bases de Datos Factuales , National Library of Medicine (U.S.)/organización & administración , Estados Unidos
20.
Genome Res ; 15(11): 1594-600, 2005 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-16251470

RESUMEN

In the attempt to understand human variation and the genetic basis of complex disease, a tremendous number of single nucleotide polymorphisms (SNPs) have been discovered and deposited into NCBI's dbSNP public database. More than 2.7 million SNPs in the database have genotype information. This data provides an invaluable resource for understanding the structure of human variation and the design of genetic association studies. The genotypes deposited to dbSNP are unphased, and thus, the haplotype information is unknown. We applied the phasing method HAP to obtain the haplotype information, block partitions, and tag SNPs for all publicly available genotype data and deposited this information into the dbSNP database. We also deposited the orthologous chimpanzee reference sequence for each predicted haplotype block computed using the UCSC BLASTZ alignments of human and chimpanzee. Using dbSNP, researchers can now easily perform analyses using multiple genotype data sets from the same genomic regions. Dense and sparse genotype data sets from the same region were combined to show that the number of common haplotypes is significantly underestimated in whole genome data sets, while the predicted haplotypes over the common SNPs are consistent between studies. To validate the accuracy of the predictions, we bench-marked HAP's running time and phasing accuracy against PHASE. Although HAP is slightly less accurate than PHASE, HAP is over 1000 times faster than PHASE, making it suitable for application to the entire set of genotypes in dbSNP.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Genómica/métodos , Haplotipos/genética , Pan troglodytes/genética , Polimorfismo de Nucleótido Simple/genética , Animales , Genotipo , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA