RESUMEN
According to the established model of murine innate lymphoid cell (ILC) development, helper ILCs develop separately from natural killer (NK) cells. However, it is unclear how helper ILCs and NK cells develop in humans. Here we elucidated key steps of NK cell, ILC2, and ILC3 development within human tonsils using ex vivo molecular and functional profiling and lineage differentiation assays. We demonstrated that while tonsillar NK cells, ILC2s, and ILC3s originated from a common CD34-CD117+ ILC precursor pool, final steps of ILC2 development deviated independently and became mutually exclusive from those of NK cells and ILC3s, whose developmental pathways overlapped. Moreover, we identified a CD34-CD117+ ILC precursor population that expressed CD56 and gave rise to NK cells and ILC3s but not to ILC2s. These data support a model of human ILC development distinct from the mouse, whereby human NK cells and ILC3s share a common developmental pathway separate from ILC2s.
Asunto(s)
Células Asesinas Naturales/inmunología , Linfocitos/inmunología , Tonsila Palatina/inmunología , Animales , Antígenos CD34/metabolismo , Antígeno CD56/metabolismo , Diferenciación Celular , Linaje de la Célula , Células Cultivadas , Perfilación de la Expresión Génica , Humanos , Inmunidad Innata , Activación de Linfocitos , Ratones , Proteínas Proto-Oncogénicas c-kit/metabolismoRESUMEN
Large-scale, population-based genomic studies have provided a context for modern medical genetics. Among such studies, however, African populations have remained relatively underrepresented. The breadth of genetic diversity across the African continent argues for an exploration of local genomic context to facilitate burgeoning disease mapping studies in Africa. We sought to characterize genetic variation and to assess population substructure within a cohort of HIV-positive children from Botswana-a Southern African country that is regionally underrepresented in genomic databases. Using whole-exome sequencing data from 164 Batswana and comparisons with 150 similarly sequenced HIV-positive Ugandan children, we found that 13%-25% of variation observed among Batswana was not captured by public databases. Uncaptured variants were significantly enriched (p = 2.2 × 10-16) for coding variants with minor allele frequencies between 1% and 5% and included predicted-damaging non-synonymous variants. Among variants found in public databases, corresponding allele frequencies varied widely, with Botswana having significantly higher allele frequencies among rare (<1%) pathogenic and damaging variants. Batswana clustered with other Southern African populations, but distinctly from 1000 Genomes African populations, and had limited evidence for admixture with extra-continental ancestries. We also observed a surprising lack of genetic substructure in Botswana, despite multiple tribal ethnicities and language groups, alongside a higher degree of relatedness than purported founder populations from the 1000 Genomes project. Our observations reveal a complex, but distinct, ancestral history and genomic architecture among Batswana and suggest that disease mapping within similar Southern African populations will require a deeper repository of genetic variation and allelic dependencies than presently exists.
Asunto(s)
Población Negra/genética , Secuenciación del Exoma , Variación Genética , Botswana , Estudios de Cohortes , Pool de Genes , Genética de Población , Genoma Humano , Geografía , Humanos , Filogenia , Análisis de Componente PrincipalRESUMEN
Whole-genome sequencing (WGS) allows for a comprehensive view of the sequence of the human genome. We present and apply integrated methodologic steps for interrogating WGS data to characterize the genetic architecture of 10 heart- and blood-related traits in a sample of 1,860 African Americans. In order to evaluate the contribution of regulatory and non-protein coding regions of the genome, we conducted aggregate tests of rare variation across the entire genomic landscape using a sliding window, complemented by an annotation-based assessment of the genome using predefined regulatory elements and within the first intron of all genes. These tests were performed treating all variants equally as well as with individual variants weighted by a measure of predicted functional consequence. Significant findings were assessed in 1,705 individuals of European ancestry. After these steps, we identified and replicated components of the genomic landscape significantly associated with heart- and blood-related traits. For two traits, lipoprotein(a) levels and neutrophil count, aggregate tests of low-frequency and rare variation were significantly associated across multiple motifs. For a third trait, cardiac troponin T, investigation of regulatory domains identified a locus on chromosome 9. These practical approaches for WGS analysis led to the identification of informative genomic regions and also showed that defined non-coding regions, such as first introns of genes and regulatory domains, are associated with important risk factor phenotypes. This study illustrates the tractable nature of WGS data and outlines an approach for characterizing the genetic architecture of complex traits.
Asunto(s)
Negro o Afroamericano/genética , Estudio de Asociación del Genoma Completo , Lipoproteína(a)/genética , Troponina T/genética , Proteína C-Reactiva/metabolismo , HDL-Colesterol/sangre , LDL-Colesterol/sangre , Cromosomas Humanos Par 9/genética , Frecuencia de los Genes , Genoma Humano , Genómica , Hemoglobinas/metabolismo , Humanos , Intrones , Recuento de Leucocitos , Lipoproteína(a)/sangre , Magnesio/sangre , Péptido Natriurético Encefálico/sangre , Péptido Natriurético Encefálico/genética , Neutrófilos/citología , Fragmentos de Péptidos/sangre , Fragmentos de Péptidos/genética , Fósforo/sangre , Recuento de Plaquetas , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Troponina T/sangre , Población Blanca/genéticaRESUMEN
BACKGROUND: The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at ~2x have shown promise for using low coverage WGS in studies focused on variant discovery, association study replications, and population genomics characterization. However, the performance of low coverage WGS in populations with a complex history and no reference panel remains to be determined. RESULTS: South Indian populations are known to have a complex population structure and are an example of a major population group that lacks adequate reference panels. To test the performance of extremely low-coverage WGS (EXL-WGS) in populations with a complex history and to provide a reference resource for South Indian populations, we performed EXL-WGS on 185 South Indian individuals from eight populations to ~1.6x coverage. Using two variant discovery pipelines, SNPTools and GATK, we generated a consensus call set that has ~90% sensitivity for identifying common variants (minor allele frequency ≥ 10%). Imputation further improves the sensitivity of our call set. In addition, we obtained high-coverage for the whole mitochondrial genome to infer the maternal lineage evolutionary history of the Indian samples. CONCLUSIONS: Overall, we demonstrate that EXL-WGS with imputation can be a valuable study design for variant discovery with a dramatically lower cost than standard WGS, even in populations with a complex history and without available reference data. In addition, the South Indian EXL-WGS data generated in this study will provide a valuable resource for future Indian genomic studies.
Asunto(s)
Pueblo Asiatico/genética , Metagenómica , Secuenciación Completa del Genoma , Variación Genética , Genoma Mitocondrial/genética , HumanosRESUMEN
BACKGROUND: The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies. RESULTS: We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms. CONCLUSIONS: Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low coverage data can be accomplished in a scalable, cost effective and fast manner by using heterogeneous computing platforms without compromising on the quality of variants.
Asunto(s)
Genoma Humano , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Bases de Datos Genéticas , HumanosRESUMEN
BACKGROUND: Detection of tandem duplication within coding exons, referred to as internal tandem duplication (ITD), remains challenging due to inefficiencies in alignment of ITD-containing reads to the reference genome. There is a critical need to develop efficient methods to recover these important mutational events. RESULTS: In this paper we introduce ITD Assembler, a novel approach that rapidly evaluates all unmapped and partially mapped reads from whole exome NGS data using a De Bruijn graphs approach to select reads that harbor cycles of appropriate length, followed by assembly using overlap-layout-consensus. We tested ITD Assembler on The Cancer Genome Atlas AML dataset as a truth set. ITD Assembler identified the highest percentage of reported FLT3-ITDs when compared to other ITD detection algorithms, and discovered additional ITDs in FLT3, KIT, CEBPA, WT1 and other genes. Evidence of polymorphic ITDs in 54 genes were also found. Novel ITDs were validated by analyzing the corresponding RNA sequencing data. CONCLUSIONS: ITD Assembler is a very sensitive tool which can detect partial, large and complex tandem duplications. This study highlights the need to more effectively look for ITD's in other cancers and Mendelian diseases.
Asunto(s)
Algoritmos , Leucemia Mieloide Aguda/genética , Mutación , Secuencias Repetidas en Tándem , Tirosina Quinasa 3 Similar a fms/genética , Exoma , Exones , Humanos , Leucemia Mieloide Aguda/diagnóstico , Técnicas de Diagnóstico MolecularRESUMEN
Characterizing meiotic recombination rates across the genomes of nonhuman primates is important for understanding the genetics of primate populations, performing genetic analyses of phenotypic variation and reconstructing the evolution of human recombination. Rhesus macaques (Macaca mulatta) are the most widely used nonhuman primates in biomedical research. We constructed a high-resolution genetic map of the rhesus genome based on whole genome sequence data from Indian-origin rhesus macaques. The genetic markers used were approximately 18 million SNPs, with marker density 6.93 per kb across the autosomes. We report that the genome-wide recombination rate in rhesus macaques is significantly lower than rates observed in apes or humans, while the distribution of recombination across the macaque genome is more uniform. These observations provide new comparative information regarding the evolution of recombination in primates.
Asunto(s)
Evolución Molecular , Macaca mulatta/genética , Meiosis/genética , Recombinación Genética , Animales , Mapeo Cromosómico , Marcadores Genéticos , Variación Genética , Genoma , Humanos , Polimorfismo de Nucleótido Simple , Especificidad de la Especie , Secuenciación Completa del GenomaRESUMEN
The genomic and clinical information used to develop and implement therapeutic approaches for acute myelogenous leukemia (AML) originated primarily from adult patients and has been generalized to patients with pediatric AML. However, age-specific molecular alterations are becoming more evident and may signify the need to age-stratify treatment regimens. The NCI/COG TARGET-AML initiative used whole exome capture sequencing (WXS) to interrogate the genomic landscape of matched trios representing specimens collected upon diagnosis, remission, and relapse from 20 cases of de novo childhood AML. One hundred forty-five somatic variants at diagnosis (median 6 mutations/patient) and 149 variants at relapse (median 6.5 mutations) were identified and verified by orthogonal methodologies. Recurrent somatic variants [in (greater than or equal to) 2 patients] were identified for 10 genes (FLT3, NRAS, PTPN11, WT1, TET2, DHX15, DHX30, KIT, ETV6, KRAS), with variable persistence at relapse. The variant allele fraction (VAF), used to measure the prevalence of somatic mutations, varied widely at diagnosis. Mutations that persisted from diagnosis to relapse had a significantly higher diagnostic VAF compared with those that resolved at relapse (median VAF 0.43 vs. 0.24, P < 0.001). Further analysis revealed that 90% of the diagnostic variants with VAF >0.4 persisted to relapse compared with 28% with VAF <0.2 (P < 0.001). This study demonstrates significant variability in the mutational profile and clonal evolution of pediatric AML from diagnosis to relapse. Furthermore, mutations with high VAF at diagnosis, representing variants shared across a leukemic clonal structure, may constrain the genomic landscape at relapse and help to define key pathways for therapeutic targeting. Cancer Res; 76(8); 2197-205. ©2016 AACR.