Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
1.
J Am Stat Assoc ; 114(526): 723-734, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31391793

RESUMEN

We consider the problem of learning a conditional Gaussian graphical model in the presence of latent variables. Building on recent advances in this field, we suggest a method that decomposes the parameters of a conditional Markov random field into the sum of a sparse and a low-rank matrix. We derive convergence bounds for this estimator and show that it is well-behaved in the high-dimensional regime as well as "sparsistent" (i.e., capable of recovering the graph structure). We then show how proximal gradient algorithms and semi-definite programming techniques can be employed to fit the model to thousands of variables. Through extensive simulations, we illustrate the conditions required for identifiability and show that there is a wide range of situations in which this model performs significantly better than its counterparts, for example, by accommodating more latent variables. Finally, the suggested method is applied to two datasets comprising individual level data on genetic variants and metabolites levels. We show our results replicate better than alternative approaches and show enriched biological signal. Supplementary materials for this article are available online.

2.
Genet Epidemiol ; 43(5): 532-547, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-30920090

RESUMEN

Genome-wide association studies (GWAS) are a powerful tool for understanding the genetic basis of diseases and traits, but most studies have been conducted in isolation, with a focus on either a single or a set of closely related phenotypes. We describe MetABF, a simple Bayesian framework for performing integrative meta-analysis across multiple GWAS using summary statistics. The approach is applicable across a wide range of study designs and can increase the power by 50% compared with standard frequentist tests when only a subset of studies have a true effect. We demonstrate its utility in a meta-analysis of 20 diverse GWAS which were part of the Wellcome Trust Case Control Consortium 2. The novelty of the approach is its ability to explore, and assess the evidence for a range of possible true patterns of association across studies in a computationally efficient framework.


Asunto(s)
Estudio de Asociación del Genoma Completo , Teorema de Bayes , Estudios de Casos y Controles , Simulación por Computador , Humanos , Modelos Genéticos , Fenotipo , Polimorfismo de Nucleótido Simple/genética
3.
J Virol ; 93(1)2019 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-30333167

RESUMEN

Accurate determination of the genetic diversity present in the HIV quasispecies is critical for the development of a preventative vaccine: in particular, little is known about viral genetic diversity for the second type of HIV, HIV-2. A better understanding of HIV-2 biology is relevant to the HIV vaccine field because a substantial proportion of infected people experience long-term viral control, and prior HIV-2 infection has been associated with slower HIV-1 disease progression in coinfected subjects. The majority of traditional and next-generation sequencing methods have relied on target amplification prior to sequencing, introducing biases that may obscure the true signals of diversity in the viral population. Additionally, target enrichment through PCR requires a priori sequence knowledge, which is lacking for HIV-2. Therefore, a target enrichment free method of library preparation would be valuable for the field. We applied an RNA shotgun sequencing (RNA-Seq) method without PCR amplification to cultured viral stocks and patient plasma samples from HIV-2-infected individuals. Libraries generated from total plasma RNA were analyzed with a two-step pipeline: (i) de novo genome assembly, followed by (ii) read remapping. By this approach, whole-genome sequences were generated with a 28× to 67× mean depth of coverage. Assembled reads showed a low level of GC bias, and comparison of the genome diversities at the intrahost level showed low diversity in the accessory gene vpx in all patients. Our study demonstrates that RNA-Seq is a feasible full-genome de novo sequencing method for blood plasma samples collected from HIV-2-infected individuals.IMPORTANCE An accurate picture of viral genetic diversity is critical for the development of a globally effective HIV vaccine. However, sequencing strategies are often complicated by target enrichment prior to sequencing, introducing biases that can distort variant frequencies, which are not easily corrected for in downstream analyses. Additionally, detailed a priori sequence knowledge is needed to inform robust primer design when employing PCR amplification, a factor that is often lacking when working with tropical diseases localized in developing countries. Previous work has demonstrated that direct RNA shotgun sequencing (RNA-Seq) can be used to circumvent these issues for hepatitis C virus (HCV) and norovirus. We applied RNA-Seq to total RNA extracted from HIV-2 blood plasma samples, demonstrating the applicability of this technique to HIV-2 and allowing us to generate a dynamic picture of genetic diversity over the whole genome of HIV-2 in the context of low-bias sequencing.


Asunto(s)
Infecciones por VIH/virología , VIH-2/genética , ARN Viral/sangre , Análisis de Secuencia de ARN/métodos , África Occidental , Sesgo , Femenino , Genoma Viral , Infecciones por VIH/sangre , VIH-2/clasificación , Humanos , Masculino , Filogenia , Cuasiespecies , Análisis de Secuencia de ARN/normas
4.
Genome Res ; 28(12): 1779-1790, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30355600

RESUMEN

Mosaic mutations present in the germline have important implications for reproductive risk and disease transmission. We previously demonstrated a phenomenon occurring in the male germline, whereby specific mutations arising spontaneously in stem cells (spermatogonia) lead to clonal expansion, resulting in elevated mutation levels in sperm over time. This process, termed "selfish spermatogonial selection," explains the high spontaneous birth prevalence and strong paternal age-effect of disorders such as achondroplasia and Apert, Noonan and Costello syndromes, with direct experimental evidence currently available for specific positions of six genes (FGFR2, FGFR3, RET, PTPN11, HRAS, and KRAS). We present a discovery screen to identify novel mutations and genes showing evidence of positive selection in the male germline, by performing massively parallel simplex PCR using RainDance technology to interrogate mutational hotspots in 67 genes (51.5 kb in total) in 276 biopsies of testes from five men (median age, 83 yr). Following ultradeep sequencing (about 16,000×), development of a low-frequency variant prioritization strategy, and targeted validation, we identified 61 distinct variants present at frequencies as low as 0.06%, including 54 variants not previously directly associated with selfish selection. The majority (80%) of variants identified have previously been implicated in developmental disorders and/or oncogenesis and include mutations in six newly associated genes (BRAF, CBL, MAP2K1, MAP2K2, RAF1, and SOS1), all of which encode components of the RAS-MAPK pathway and activate signaling. Our findings extend the link between mutations dysregulating the RAS-MAPK pathway and selfish selection, and show that the aging male germline is a repository for such deleterious mutations.


Asunto(s)
Proteínas Quinasas Activadas por Mitógenos/metabolismo , Mutación , Transducción de Señal , Testículo/metabolismo , Proteínas ras/metabolismo , Anciano , Anciano de 80 o más Años , Variación Genética , Humanos , Masculino , Persona de Mediana Edad
5.
PLoS One ; 12(5): e0178169, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28542371

RESUMEN

Adult male germline stem cells (spermatogonia) proliferate by mitosis and, after puberty, generate spermatocytes that undertake meiosis to produce haploid spermatozoa. Germ cells are under evolutionary constraint to curtail mutations and maintain genome integrity. Despite constant turnover, spermatogonia very rarely form tumors, so-called spermatocytic tumors (SpT). In line with the previous identification of FGFR3 and HRAS selfish mutations in a subset of cases, candidate gene screening of 29 SpTs identified an oncogenic NRAS mutation in two cases. To gain insights in the etiology of SpT and into properties of the male germline, we performed whole-genome sequencing of five tumors (4/5 with matched normal tissue). The acquired single nucleotide variant load was extremely low (~0.2 per Mb), with an average of 6 (2-9) non-synonymous variants per tumor, none of which is likely to be oncogenic. The observed mutational signature of SpTs is strikingly similar to that of germline de novo mutations, mostly involving C>T transitions with a significant enrichment in the ACG trinucleotide context. The tumors exhibited extensive aneuploidy (50-99 autosomes/tumor) involving whole-chromosomes, with recurrent gains of chr9 and chr20 and loss of chr7, suggesting that aneuploidy itself represents the initiating oncogenic event. We propose that SpT etiology recapitulates the unique properties of male germ cells; because of evolutionary constraints to maintain low point mutation rate, rare tumorigenic driver events are caused by a combination of gene imbalance mediated via whole-chromosome aneuploidy. Finally, we propose a general framework of male germ cell tumor pathology that accounts for their mutational landscape, timing and cellular origin.


Asunto(s)
Biomarcadores de Tumor/genética , Genoma Humano , Mutación de Línea Germinal/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Espermatocitos/patología , Neoplasias Testiculares/genética , Variaciones en el Número de Copia de ADN/genética , Metilación de ADN , Humanos , Masculino , Receptor Tipo 3 de Factor de Crecimiento de Fibroblastos , Maduración Sexual , Espermatocitos/metabolismo , Neoplasias Testiculares/patología
6.
Nat Genet ; 49(5): 666-673, 2017 May.
Artículo en Inglés | MEDLINE | ID: mdl-28394351

RESUMEN

Outcomes of hepatitis C virus (HCV) infection and treatment depend on viral and host genetic factors. Here we use human genome-wide genotyping arrays and new whole-genome HCV viral sequencing technologies to perform a systematic genome-to-genome study of 542 individuals who were chronically infected with HCV, predominantly genotype 3. We show that both alleles of genes encoding human leukocyte antigen molecules and genes encoding components of the interferon lambda innate immune system drive viral polymorphism. Additionally, we show that IFNL4 genotypes determine HCV viral load through a mechanism dependent on a specific amino acid residue in the HCV NS5A protein. These findings highlight the interplay between the innate immune system and the viral genome in HCV control.


Asunto(s)
Inmunidad Adaptativa/genética , Genoma Humano/genética , Genoma Viral/genética , Hepacivirus/genética , Hepatitis C Crónica/genética , Inmunidad Innata/genética , Alelos , Variación Genética , Genotipo , Antígenos HLA/genética , Hepacivirus/fisiología , Hepatitis C Crónica/virología , Interacciones Huésped-Patógeno/genética , Humanos , Interleucinas/genética , Modelos Logísticos , Análisis de Componente Principal , Carga Viral/genética , Proteínas no Estructurales Virales/genética
7.
PLoS Comput Biol ; 12(5): e1004842, 2016 05.
Artículo en Inglés | MEDLINE | ID: mdl-27145223

RESUMEN

A central challenge in the analysis of genetic variation is to provide realistic genome simulation across millions of samples. Present day coalescent simulations do not scale well, or use approximations that fail to capture important long-range linkage properties. Analysing the results of simulations also presents a substantial challenge, as current methods to store genealogies consume a great deal of space, are slow to parse and do not take advantage of shared structure in correlated trees. We solve these problems by introducing sparse trees and coalescence records as the key units of genealogical analysis. Using these tools, exact simulation of the coalescent with recombination for chromosome-sized regions over hundreds of thousands of samples is possible, and substantially faster than present-day approximate methods. We can also analyse the results orders of magnitude more quickly than with existing methods.


Asunto(s)
Variación Genética , Modelos Genéticos , Linaje , Algoritmos , Biología Computacional , Simulación por Computador , Evolución Molecular , Genética de Población , Humanos , Recombinación Genética , Tamaño de la Muestra
8.
Bioinformatics ; 32(12): 1898-900, 2016 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-26873930

RESUMEN

MOTIVATION: For many classes of disease the same genetic risk variants underly many related phenotypes or disease subtypes. Multinomial logistic regression provides an attractive framework to analyze multi-category phenotypes, and explore the genetic relationships between these phenotype categories. We introduce Trinculo, a program that implements a wide range of multinomial analyses in a single fast package that is designed to be easy to use by users of standard genome-wide association study software. AVAILABILITY AND IMPLEMENTATION: An open source C implementation, with code and binaries for Linux and Mac OSX, is available for download at http://sourceforge.net/projects/trinculo SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: lj4@well.ox.ac.uk.


Asunto(s)
Teorema de Bayes , Estudio de Asociación del Genoma Completo , Modelos Logísticos , Fenotipo , Programas Informáticos , Humanos
9.
Nat Genet ; 47(3): 226-34, 2015 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-25599401

RESUMEN

We report a large multicenter genome-wide association study of Plasmodium falciparum resistance to artemisinin, the frontline antimalarial drug. Across 15 locations in Southeast Asia, we identified at least 20 mutations in kelch13 (PF3D7_1343700) affecting the encoded propeller and BTB/POZ domains, which were associated with a slow parasite clearance rate after treatment with artemisinin derivatives. Nonsynonymous polymorphisms in fd (ferredoxin), arps10 (apicoplast ribosomal protein S10), mdr2 (multidrug resistance protein 2) and crt (chloroquine resistance transporter) also showed strong associations with artemisinin resistance. Analysis of the fine structure of the parasite population showed that the fd, arps10, mdr2 and crt polymorphisms are markers of a genetic background on which kelch13 mutations are particularly likely to arise and that they correlate with the contemporary geographical boundaries and population frequencies of artemisinin resistance. These findings indicate that the risk of new resistance-causing mutations emerging is determined by specific predisposing genetic factors in the underlying parasite population.


Asunto(s)
Antimaláricos/farmacología , Artemisininas/farmacología , Genoma de Protozoos , Plasmodium falciparum/efectos de los fármacos , Plasmodium falciparum/genética , Resistencia a Medicamentos/genética , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo/métodos , Humanos , Malaria Falciparum/tratamiento farmacológico , Malaria Falciparum/parasitología , Mutación , Polimorfismo de Nucleótido Simple
10.
Proc Natl Acad Sci U S A ; 110(50): 20152-7, 2013 Dec 10.
Artículo en Inglés | MEDLINE | ID: mdl-24259709

RESUMEN

The RAS proto-oncogene Harvey rat sarcoma viral oncogene homolog (HRAS) encodes a small GTPase that transduces signals from cell surface receptors to intracellular effectors to control cellular behavior. Although somatic HRAS mutations have been described in many cancers, germline mutations cause Costello syndrome (CS), a congenital disorder associated with predisposition to malignancy. Based on the epidemiology of CS and the occurrence of HRAS mutations in spermatocytic seminoma, we proposed that activating HRAS mutations become enriched in sperm through a process akin to tumorigenesis, termed selfish spermatogonial selection. To test this hypothesis, we quantified the levels, in blood and sperm samples, of HRAS mutations at the p.G12 codon and compared the results to changes at the p.A11 codon, at which activating mutations do not occur. The data strongly support the role of selection in determining HRAS mutation levels in sperm, and hence the occurrence of CS, but we also found differences from the mutation pattern in tumorigenesis. First, the relative prevalence of mutations in sperm correlates weakly with their in vitro activating properties and occurrence in cancers. Second, specific tandem base substitutions (predominantly GC>TT/AA) occur in sperm but not in cancers; genomewide analysis showed that this same mutation is also overrepresented in constitutional pathogenic and polymorphic variants, suggesting a heightened vulnerability to these mutations in the germline. We developed a statistical model to show how both intrinsic mutation rate and selfish selection contribute to the mutational burden borne by the paternal germline.


Asunto(s)
Envejecimiento/genética , Carcinogénesis/genética , Síndrome de Costello/genética , Células Germinativas/química , Proteínas Proto-Oncogénicas p21(ras)/genética , Selección Genética/genética , Adulto , Anciano , Envejecimiento/sangre , Codón/genética , Humanos , Masculino , Persona de Mediana Edad , Modelos Estadísticos , Mutación/genética , Proto-Oncogenes Mas
11.
J Clin Endocrinol Metab ; 98(4): E796-800, 2013 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-23450047

RESUMEN

CONTEXT: The tumorigenic role of genetic abnormalities in sporadic pituitary nonfunctioning adenomas (NFAs), which usually originate from gonadotroph cells, is unknown. OBJECTIVE: The objective of the study was to identify somatic genetic abnormalities in sporadic pituitary NFAs. DESIGN: Whole-exome sequencing was performed using DNA from 7 pituitary NFAs and leukocyte samples obtained from the same patients. Somatic variants were confirmed by dideoxynucleotide sequencing, and candidate driver genes were assessed in an additional 24 pituitary NFAs. RESULTS: Whole-exome sequencing achieved a high degree of coverage such that approximately 97% of targeted bases were represented by more than 10 base reads; 24 somatic variants were identified and confirmed in the discovery set of 7 pituitary NFAs (mean 3.5 variants/tumor; range 1-7). Approximately 80% of variants occurred as missense single nucleotide variants and the remainder were synonymous changes or small frameshift deletions. Each of the 24 mutations occurred in independent genes with no recurrent mutations. Mutations were not observed in genes previously associated with pituitary tumorigenesis, although somatic variants in putative driver genes including platelet-derived growth factor D (PDGFD), N-myc down-regulated gene family member 4 (NDRG4), and Zipper sterile-α-motif kinase (ZAK) were identified; however, DNA sequence analysis of these in the validation set of 24 pituitary NFAs did not reveal any mutations indicating that these genes are unlikely to contribute significantly in the etiology of sporadic pituitary NFAs. CONCLUSIONS: Pituitary NFAs harbor few somatic mutations consistent with their low proliferation rates and benign nature, but mechanisms other than somatic mutation are likely involved in the etiology of sporadic pituitary NFAs.


Asunto(s)
Adenoma/genética , Exoma/genética , Neoplasias Hipofisarias/genética , Análisis de Secuencia de ADN , Adenoma/epidemiología , Adenoma/fisiopatología , Adulto , Anciano , Anciano de 80 o más Años , Análisis Mutacional de ADN , Femenino , Regulación Neoplásica de la Expresión Génica , Estudios de Asociación Genética , Humanos , Masculino , Análisis por Micromatrices , Persona de Mediana Edad , Mutación/fisiología , Neoplasias Hipofisarias/epidemiología , Neoplasias Hipofisarias/fisiopatología , Análisis de Secuencia de ADN/métodos , Transcriptoma
12.
Science ; 339(6127): 1578-82, 2013 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-23413192

RESUMEN

Instances in which natural selection maintains genetic variation in a population over millions of years are thought to be extremely rare. We conducted a genome-wide scan for long-lived balancing selection by looking for combinations of SNPs shared between humans and chimpanzees. In addition to the major histocompatibility complex, we identified 125 regions in which the same haplotypes are segregating in the two species, all but two of which are noncoding. In six cases, there is evidence for an ancestral polymorphism that persisted to the present in humans and chimpanzees. Regions with shared haplotypes are significantly enriched for membrane glycoproteins, and a similar trend is seen among shared coding polymorphisms. These findings indicate that ancient balancing selection has shaped human variation and point to genes involved in host-pathogen interactions as common targets.


Asunto(s)
Genoma Humano/genética , Interacciones Huésped-Patógeno/genética , Pan troglodytes/genética , Selección Genética , Animales , Secuencia de Bases , Estudios de Asociación Genética , Haplotipos , Humanos , Datos de Secuencia Molecular , Linaje , Polimorfismo de Nucleótido Simple
13.
Nat Genet ; 45(2): 136-44, 2013 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-23263490

RESUMEN

Many individuals with multiple or large colorectal adenomas or early-onset colorectal cancer (CRC) have no detectable germline mutations in the known cancer predisposition genes. Using whole-genome sequencing, supplemented by linkage and association analysis, we identified specific heterozygous POLE or POLD1 germline variants in several multiple-adenoma and/or CRC cases but in no controls. The variants associated with susceptibility, POLE p.Leu424Val and POLD1 p.Ser478Asn, have high penetrance, and POLD1 mutation was also associated with endometrial cancer predisposition. The mutations map to equivalent sites in the proofreading (exonuclease) domain of DNA polymerases ɛ and δ and are predicted to cause a defect in the correction of mispaired bases inserted during DNA replication. In agreement with this prediction, the tumors from mutation carriers were microsatellite stable but tended to acquire base substitution mutations, as confirmed by yeast functional assays. Further analysis of published data showed that the recently described group of hypermutant, microsatellite-stable CRCs is likely to be caused by somatic POLE mutations affecting the exonuclease domain.


Asunto(s)
Adenoma/genética , Neoplasias Colorrectales/genética , Reparación de la Incompatibilidad de ADN/genética , ADN Polimerasa III/genética , ADN Polimerasa II/genética , Replicación del ADN/genética , Modelos Moleculares , Exodesoxirribonucleasas/genética , Ligamiento Genético , Estudio de Asociación del Genoma Completo , Mutación de Línea Germinal/genética , Humanos , Repeticiones de Microsatélite/genética , Linaje , Proteínas de Unión a Poli-ADP-Ribosa , Schizosaccharomyces/genética , Análisis de Secuencia de ADN
14.
PLoS Genet ; 8(12): e1003074, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23236289

RESUMEN

ß-III spectrin is present in the brain and is known to be important in the function of the cerebellum. Heterozygous mutations in SPTBN2, the gene encoding ß-III spectrin, cause Spinocerebellar Ataxia Type 5 (SCA5), an adult-onset, slowly progressive, autosomal-dominant pure cerebellar ataxia. SCA5 is sometimes known as "Lincoln ataxia," because the largest known family is descended from relatives of the United States President Abraham Lincoln. Using targeted capture and next-generation sequencing, we identified a homozygous stop codon in SPTBN2 in a consanguineous family in which childhood developmental ataxia co-segregates with cognitive impairment. The cognitive impairment could result from mutations in a second gene, but further analysis using whole-genome sequencing combined with SNP array analysis did not reveal any evidence of other mutations. We also examined a mouse knockout of ß-III spectrin in which ataxia and progressive degeneration of cerebellar Purkinje cells has been previously reported and found morphological abnormalities in neurons from prefrontal cortex and deficits in object recognition tasks, consistent with the human cognitive phenotype. These data provide the first evidence that ß-III spectrin plays an important role in cortical brain development and cognition, in addition to its function in the cerebellum; and we conclude that cognitive impairment is an integral part of this novel recessive ataxic syndrome, Spectrin-associated Autosomal Recessive Cerebellar Ataxia type 1 (SPARCA1). In addition, the identification of SPARCA1 and normal heterozygous carriers of the stop codon in SPTBN2 provides insights into the mechanism of molecular dominance in SCA5 and demonstrates that the cell-specific repertoire of spectrin subunits underlies a novel group of disorders, the neuronal spectrinopathies, which includes SCA5, SPARCA1, and a form of West syndrome.


Asunto(s)
Cerebelo , Espectrina/genética , Ataxias Espinocerebelosas , Adulto , Animales , Cerebelo/crecimiento & desarrollo , Cerebelo/patología , Mapeo Cromosómico , Trastornos del Conocimiento/genética , Humanos , Ratones , Ratones Noqueados , Mutación , Neuronas/metabolismo , Neuronas/patología , Células de Purkinje/patología , Ataxias Espinocerebelosas/genética , Ataxias Espinocerebelosas/fisiopatología
15.
Nat Genet ; 44(12): 1294-301, 2012 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-23104008

RESUMEN

To further investigate susceptibility loci identified by genome-wide association studies, we genotyped 5,500 SNPs across 14 associated regions in 8,000 samples from a control group and 3 diseases: type 2 diabetes (T2D), coronary artery disease (CAD) and Graves' disease. We defined, using Bayes theorem, credible sets of SNPs that were 95% likely, based on posterior probability, to contain the causal disease-associated SNPs. In 3 of the 14 regions, TCF7L2 (T2D), CTLA4 (Graves' disease) and CDKN2A-CDKN2B (T2D), much of the posterior probability rested on a single SNP, and, in 4 other regions (CDKN2A-CDKN2B (CAD) and CDKAL1, FTO and HHEX (T2D)), the 95% sets were small, thereby excluding most SNPs as potentially causal. Very few SNPs in our credible sets had annotated functions, illustrating the limitations in understanding the mechanisms underlying susceptibility to common diseases. Our results also show the value of more detailed mapping to target sequences for functional studies.


Asunto(s)
Enfermedad de la Arteria Coronaria/genética , Diabetes Mellitus Tipo 2/genética , Sitios Genéticos , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Enfermedad de Graves/genética , Dioxigenasa FTO Dependiente de Alfa-Cetoglutarato , Teorema de Bayes , Antígeno CTLA-4/genética , Quinasa 5 Dependiente de la Ciclina/genética , Inhibidor p15 de las Quinasas Dependientes de la Ciclina/genética , Genes p16 , Proteínas de Homeodominio/genética , Humanos , Polimorfismo de Nucleótido Simple , Proteínas/genética , Proteína 2 Similar al Factor de Transcripción 7/genética , Factores de Transcripción/genética , ARNt Metiltransferasas
16.
Science ; 334(6062): 1518-24, 2011 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-22174245

RESUMEN

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.


Asunto(s)
Interpretación Estadística de Datos , Algoritmos , Animales , Béisbol/estadística & datos numéricos , Femenino , Expresión Génica , Genes Fúngicos , Genómica/métodos , Humanos , Intestinos/microbiología , Masculino , Metagenoma , Ratones , Obesidad , Saccharomyces cerevisiae/genética
17.
Bioinformatics ; 27(15): 2156-8, 2011 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-21653522

RESUMEN

SUMMARY: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. AVAILABILITY: http://vcftools.sourceforge.net


Asunto(s)
Variación Genética , Genómica/métodos , Almacenamiento y Recuperación de la Información/métodos , Programas Informáticos , Alelos , Genoma Humano , Genotipo , Humanos
18.
Blood ; 118(3): 670-4, 2011 Jul 21.
Artículo en Inglés | MEDLINE | ID: mdl-21596858

RESUMEN

Since an association between the human leukocyte antigen (HLA) region and Hodgkin lymphoma (HL) was first reported in 1967, many studies have reported associations between HL risk and both single nucleotide polymorphism (SNP) and classic HLA allele variation in the major histocompatibility complex. However, population stratification and the extent and complexity of linkage disequilibrium within the major histocompatibility complex have hindered efforts to fine-map causal signals. Using SNP data to impute alleles at classic HLA loci, we have conducted an integrated analysis of HL risk within the HLA region in 582 early-onset HL cases and 4736 controls. We confirm that the strongest signal of association comes from an SNP located in the class II region, rs6903608 (odds ratio [OR] = 1.79, P = 6.63 × 10(-19)), which is unlikely to be driven by association to HLA-DRB, DQA, or DQB alleles. In addition, we identify independent signals at rs2281389 (OR = 1.73, P = 6.31 × 10(-13)), a SNP that maps closely to HLA-DPB1, and the class II HLA allele DQA1*02:01 (OR = 0.56, P = 1.51 × 10(-7)). These data suggest that multiple independent loci within the HLA class II region contribute to the risk of developing early-onset HL.


Asunto(s)
Cromosomas Humanos Par 6 , Antígenos HLA/genética , Enfermedad de Hodgkin/epidemiología , Enfermedad de Hodgkin/genética , Edad de Inicio , Predisposición Genética a la Enfermedad/epidemiología , Predisposición Genética a la Enfermedad/genética , Humanos , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Factores de Riesgo
19.
Genome Biol ; 12(4): R33, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21463505

RESUMEN

BACKGROUND: The human malaria parasite Plasmodium falciparum survives pressures from the host immune system and antimalarial drugs by modifying its genome. Genetic recombination and nucleotide substitution are the two major mechanisms that the parasite employs to generate genome diversity. A better understanding of these mechanisms may provide important information for studying parasite evolution, immune evasion and drug resistance. RESULTS: Here, we used a high-density tiling array to estimate the genetic recombination rate among 32 progeny of a P. falciparum genetic cross (7G8 × GB4). We detected 638 recombination events and constructed a high-resolution genetic map. Comparing genetic and physical maps, we obtained an overall recombination rate of 9.6 kb per centimorgan and identified 54 candidate recombination hotspots. Similar to centromeres in other organisms, the sequences of P. falciparum centromeres are found in chromosome regions largely devoid of recombination activity. Motifs enriched in hotspots were also identified, including a 12-bp G/C-rich motif with 3-bp periodicity that may interact with a protein containing 11 predicted zinc finger arrays. CONCLUSIONS: These results show that the P. falciparum genome has a high recombination rate, although it also follows the overall rule of meiosis in eukaryotes with an average of approximately one crossover per chromosome per meiosis. GC-rich repetitive motifs identified in the hotspot sequences may play a role in the high recombination rate observed. The lack of recombination activity in centromeric regions is consistent with the observations of reduced recombination near the centromeres of other organisms.


Asunto(s)
Intercambio Genético , Meiosis/genética , Plasmodium falciparum/genética , Recombinación Genética/genética , Mapeo Cromosómico , Cruzamientos Genéticos , Variación Genética , Genoma de Protozoos , Humanos , Malaria/parasitología
20.
Science ; 331(6019): 920-4, 2011 Feb 18.
Artículo en Inglés | MEDLINE | ID: mdl-21330547

RESUMEN

Efforts to identify the genetic basis of human adaptations from polymorphism data have sought footprints of "classic selective sweeps" (in which a beneficial mutation arises and rapidly fixes in the population).Yet it remains unknown whether this form of natural selection was common in our evolution. We examined the evidence for classic sweeps in resequencing data from 179 human genomes. As expected under a recurrent-sweep model, we found that diversity levels decrease near exons and conserved noncoding regions. In contrast to expectation, however, the trough in diversity around human-specific amino acid substitutions is no more pronounced than around synonymous substitutions. Moreover, relative to the genome background, amino acid and putative regulatory sites are not significantly enriched in alleles that are highly differentiated between populations. These findings indicate that classic sweeps were not a dominant mode of human adaptation over the past ~250,000 years.


Asunto(s)
Evolución Biológica , Variación Genética , Genoma Humano , Selección Genética , Adaptación Biológica , Sustitución de Aminoácidos , Cromosomas Humanos X/genética , Secuencia Conservada , Evolución Molecular , Exones , Frecuencia de los Genes , Haplotipos , Humanos , Modelos Genéticos , Anotación de Secuencia Molecular , Mutación , Polimorfismo de Nucleótido Simple , Recombinación Genética , Regiones no Traducidas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...