RESUMEN
CCCTC-binding factor (CTCF) binding sites are hotspots of genome instability. Although many factors have been associated with CTCF binding site fragility, no study has integrated all fragility-related factors to understand the mechanism(s) of how they work together. Using an unbiased, genome-wide approach, we found that DNA double-strand breaks (DSBs) are enriched at strong, but not weak, CTCF binding sites in five human cell types. Energetically favorable alternative DNA secondary structures underlie strong CTCF binding sites. These structures coincided with the location of topoisomerase II (TOP2) cleavage complex, suggesting that DNA secondary structure acts as a recognition sequence for TOP2 binding and cleavage at CTCF binding sites. Furthermore, CTCF knockdown significantly increased DSBs at strong CTCF binding sites and at CTCF sites that are located at topologically associated domain (TAD) boundaries. TAD boundary-associated CTCF sites that lost CTCF upon knockdown displayed increased DSBs when compared to the gained sites, and those lost sites are overrepresented with G-quadruplexes, suggesting that the structures act as boundary insulators in the absence of CTCF, and contribute to increased DSBs. These results model how alternative DNA secondary structures facilitate recruitment of TOP2 to CTCF binding sites, providing mechanistic insight into DNA fragility at CTCF binding sites.
Asunto(s)
Factor de Unión a CCCTC , Roturas del ADN de Doble Cadena , ADN-Topoisomerasas de Tipo II , ADN , Conformación de Ácido Nucleico , ADN-Topoisomerasas de Tipo II/metabolismo , ADN-Topoisomerasas de Tipo II/genética , ADN-Topoisomerasas de Tipo II/química , Humanos , Factor de Unión a CCCTC/metabolismo , Factor de Unión a CCCTC/genética , Sitios de Unión , ADN/metabolismo , ADN/química , ADN/genética , Unión Proteica , Proteínas de Unión a Poli-ADP-Ribosa/metabolismo , Proteínas de Unión a Poli-ADP-Ribosa/genética , Proteínas de Unión a Poli-ADP-Ribosa/química , Línea CelularRESUMEN
Germline variants have a rich history of being studied in the context of cancer risk. Emerging studies now suggest that germline variants contribute not only to cancer risk but to tumor progression as well. In this opinion article, we discuss the initial discoveries associating germline variants with patient outcome and the mechanisms by which germline variants affect molecular pathways. Germline variants affect molecular pathways through amino acid changes, alteration of splicing patterns or expression of genes, influencing the selection for somatic mutations, and causing genome-wide mutational enrichment. These molecular alterations can lead to tumor phenotypes that become clinically apparent such as metastasis, alterations to the immune microenvironment, and modulation of therapeutic response. Overall, the growing body of evidence suggests that germline variants play a larger role in tumor progression than has been previously appreciated and that germline variation holds substantial potential for improving personalized medicine and patient outcomes.
Asunto(s)
Mutación de Línea Germinal , Neoplasias/genética , Genes Supresores de Tumor , Predisposición Genética a la Enfermedad , Humanos , Neoplasias/tratamiento farmacológico , Neoplasias/patología , Variantes Farmacogenómicas , Polimorfismo de Nucleótido Simple , Medicina de Precisión , Resultado del TratamientoRESUMEN
Large granular lymphocyte (LGL) leukemia comprises a group of rare lymphoproliferative disorders whose molecular landscape is incompletely defined. We leveraged paired whole-exome and transcriptome sequencing in the largest LGL leukemia cohort to date, which included 105 patients (93 T-cell receptor αß [TCRαß] T-LGL and 12 TCRγδ T-LGL). Seventy-six mutations were observed in 3 or more patients in the cohort, and out of those, STAT3, KMT2D, PIK3R1, TTN, EYS, and SULF1 mutations were shared between both subtypes. We identified ARHGAP25, ABCC9, PCDHA11, SULF1, SLC6A15, DDX59, DNMT3A, FAS, KDM6A, KMT2D, PIK3R1, STAT3, STAT5B, TET2, and TNFAIP3 as recurrently mutated putative drivers using an unbiased driver analysis approach leveraging our whole-exome cohort. Hotspot mutations in STAT3, PIK3R1, and FAS were detected, whereas truncating mutations in epigenetic modifying enzymes such as KMT2D and TET2 were observed. Moreover, STAT3 mutations co-occurred with mutations in chromatin and epigenetic modifying genes, especially KMT2D and SETD1B (P < .01 and P < .05, respectively). STAT3 was mutated in 50.5% of the patients. Most common Y640F STAT3 mutation was associated with lower absolute neutrophil count values, and N647I mutation was associated with lower hemoglobin values. Somatic activating mutations (Q160P, D170Y, L287F) in the STAT3 coiled-coil domain were characterized. STAT3-mutant patients exhibited increased mutational burden and enrichment of a mutational signature associated with increased spontaneous deamination of 5-methylcytosine. Finally, gene expression analysis revealed enrichment of interferon-γ signaling and decreased phosphatidylinositol 3-kinase-Akt signaling for STAT3-mutant patients. These findings highlight the clinical and molecular heterogeneity of this rare disorder.
Asunto(s)
Sistemas de Transporte de Aminoácidos Neutros , Leucemia Linfocítica Granular Grande , Sistemas de Transporte de Aminoácidos Neutros/genética , Exoma , Proteínas del Ojo/genética , Genómica , Humanos , Leucemia Linfocítica Granular Grande/genética , Mutación , Proteínas del Tejido Nervioso/genética , ARN Helicasas/genética , ARN Helicasas/metabolismo , Receptores de Antígenos de Linfocitos T alfa-beta/genética , Receptores de Antígenos de Linfocitos T gamma-delta/metabolismo , Factor de Transcripción STAT3/genética , Factor de Transcripción STAT3/metabolismoRESUMEN
Species across the tree of life can switch between asexual and sexual reproduction. In facultatively sexual species, the ability to switch between reproductive modes is often environmentally dependent and subject to local adaptation. However, the ecological and evolutionary factors that influence the maintenance and turnover of polymorphism associated with facultative sex remain unclear. We studied the ecological and evolutionary dynamics of reproductive investment in the facultatively sexual model species, Daphnia pulex. We found that patterns of clonal diversity, but not genetic diversity varied among ponds consistent with the predicted relationship between ephemerality and clonal structure. Reconstruction of a multi-year pedigree demonstrated the coexistence of clones that differ in their investment into male production. Mapping of quantitative variation in male production using lab-generated and field-collected individuals identified multiple putative quantitative trait loci (QTL) underlying this trait, and we identified a plausible candidate gene. The evolutionary history of these QTL suggests that they are relatively young, and male limitation in this system is a rapidly evolving trait. Our work highlights the dynamic nature of the genetic structure and composition of facultative sex across space and time and suggests that quantitative genetic variation in reproductive strategy can undergo rapid evolutionary turnover.
Asunto(s)
Daphnia , Reproducción , Adaptación Fisiológica/genética , Animales , Daphnia/genética , Variación Genética , Masculino , Polimorfismo Genético , Sitios de Carácter Cuantitativo , Reproducción/genéticaRESUMEN
Chronic natural killer large granular lymphocyte (NK-LGL) leukemia, also referred to as chronic lymphoproliferative disorder of NK cells, is a rare disorder defined by prolonged expansion of clonal NK cells. Similar prevalence of STAT3 mutations in chronic T-LGL and NK-LGL leukemia is suggestive of common pathogenesis. We undertook whole-genome sequencing to identify mutations unique to NK-LGL leukemia. The results were analyzed to develop a resequencing panel that was applied to 58 patients. Phosphatidylinositol 3-kinase pathway gene mutations (PIK3CD/PIK3AP1) and TNFAIP3 mutations were seen in 5% and 10% of patients, respectively. TET2 was exceptional in that mutations were present in 16 (28%) of 58 patient samples, with evidence that TET2 mutations can be dominant and exclusive to the NK compartment. Reduced-representation bisulfite sequencing revealed that methylation patterns were significantly altered in TET2 mutant samples. The promoter of TET2 and that of PTPRD, a negative regulator of STAT3, were found to be methylated in additional cohort samples, largely confined to the TET2 mutant group. Mutations in STAT3 were observed in 19 (33%) of 58 patient samples, 7 of which had concurrent TET2 mutations. Thrombocytopenia and resistance to immunosuppressive agents were uniquely observed in those patients with only TET2 mutation (Games-Howell post hoc test, P = .0074; Fisher's exact test, P = .00466). Patients with STAT3 mutation, inclusive of those with TET2 comutation, had lower hematocrit, hemoglobin, and absolute neutrophil count compared with STAT3 wild-type patients (Welch's t test, P ≤ .015). We present the discovery of TET2 mutations in chronic NK-LGL leukemia and evidence that it identifies a unique molecular subtype.
Asunto(s)
Proteínas de Unión al ADN/genética , Dioxigenasas/genética , Leucemia Linfocítica Granular Grande/genética , Mutación , Proteínas de Neoplasias/genética , Sistema de Registros , Enfermedad Crónica , Proteínas de Unión al ADN/sangre , Dioxigenasas/sangre , Femenino , Humanos , Leucemia Linfocítica Granular Grande/sangre , Masculino , Proteínas de Neoplasias/sangreRESUMEN
BACKGROUND: The lymph node metastasis-derived LNCaP, the bone metastasis-derived PC3 (skull), and VCaP (vertebral) cell lines are widely used as preclinical models of human prostate cancer (CaP) and have been described in more than 19,000 publications. Here, we report on short-read whole-genome sequencing and genomic analyses of LNCaP, VCaP, and PC3 cells stably transduced with WT AR (PC3-AR). METHODS: LNCaP, VCaP, and PC3-AR cell lines were sequenced to an average depth of more than 30-fold using Illumina short-read sequencing. Using various computational methods, we identified and compared the single-nucleotide variants, copy-number profiles, and the structural variants observed in the three cell lines. RESULTS: LNCaP cells are composed of multiple subpopulations, which results in nonintegral copy number states and a high mutational load when the data is analyzed in bulk. All three cell lines contain pathogenic mutations and homozygous deletions in genes involved in DNA mismatch repair, along with deleterious mutations in cell-cycle, Wnt signaling, and other critical cellular processes. PC3-AR cells have a truncating mutation in TP53 and do not express the p53 protein. The VCaP cells contain a homozygous gain-of-function mutation in TP53 (p.R248W) that promotes cancer invasion, metastasis, and progression and has also been observed in prostate adenocarcinomas. In addition, we detect the signatures of chromothripsis of the q arms of chromosome 5 in both PC3-AR and VCaP cells, strengthening the association of TP53 inactivation with chromothripsis reported in other systems. CONCLUSIONS: Our work provides a resource for genetic, genomic, and biological studies employing these commonly-used prostate cancer cell lines.
Asunto(s)
Línea Celular Tumoral/patología , Metástasis de la Neoplasia/genética , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/patología , Secuenciación Completa del Genoma , Adenocarcinoma/genética , Neoplasias Óseas/secundario , Ciclo Celular/genética , Reparación de la Incompatibilidad de ADN/genética , Eliminación de Gen , Humanos , Metástasis Linfática/genética , Masculino , Mutación , Invasividad Neoplásica/genética , Células PC-3 , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Recent work has demonstrated that two archaic human groups (Neanderthals and Denisovans) interbred with modern humans and contributed to the contemporary human gene pool. These findings relied on the availability of high-coverage genomes from both Neanderthals and Denisovans. Here we search for evidence of archaic admixture from a worldwide panel of 1,667 individuals using an approach that does not require the presence of an archaic human reference genome. We find no evidence for archaic admixture in the Andaman Islands, as previously claimed, or on the island of Flores, where Homo floresiensis fossils have been found. However, we do find evidence for at least one archaic admixture event in sub-Saharan Africa, with the strongest signal in Khoesan and Pygmy individuals from Southern and Central Africa. The locations of these putative archaic admixture tracts are weighted against functional regions of the genome, consistent with the long-term effects of purifying selection against introgressed genetic material.
Asunto(s)
Población Negra/genética , Fósiles , Genética de Población , Genoma Humano , Hominidae/genética , Hombre de Neandertal/genética , Animales , Pool de Genes , HumanosRESUMEN
The identification of structural variants using short-read data remains challenging. Most approaches that use discordant paired-end sequences ignore non-trivial signatures presented by variants containing 3 breakpoints, such as those generated by various copy-paste and cut-paste mechanisms. This can result in lower precision and sensitivity in the identification of the more common structural variants such as deletions and duplications. We present SVXplorer, which uses a graph-based clustering approach streamlined by the integration of non-trivial signatures from discordant paired-end alignments, split-reads and read depth information to improve upon existing methods. We show that SVXplorer is more sensitive and precise compared to several existing approaches on multiple real and simulated datasets. SVXplorer is available for download at https://github.com/kunalkathuria/SVXplorer.
Asunto(s)
Variación Estructural del Genoma/genética , Genómica/métodos , Recombinación Genética/genética , Programas Informáticos , Análisis por Conglomerados , Simulación por Computador , Bases de Datos Genéticas , Genoma Humano/genética , Humanos , Análisis de Secuencia de ADNRESUMEN
MOTIVATION: Genomic data is frequently stored as segments or intervals. Because this data type is so common, interval-based comparisons are fundamental to genomic analysis. As the volume of available genomic data grows, developing efficient and scalable methods for searching interval data is necessary. RESULTS: We present a new data structure, the Augmented Interval List (AIList), to enumerate intersections between a query interval q and an interval set R. An AIList is constructed by first sorting R as a list by the interval start coordinate, then decomposing it into a few approximately flattened components (sublists), and then augmenting each sublist with the running maximum interval end. The query time for AIList is O(log2N+n+m), where n is the number of overlaps between R and q, N is the number of intervals in the set R and m is the average number of extra comparisons required to find the n overlaps. Tested on real genomic interval datasets, AIList code runs 5-18 times faster than standard high-performance code based on augmented interval-trees, nested containment lists or R-trees (BEDTools). For large datasets, the memory-usage for AIList is 4-60% of other methods. The AIList data structure, therefore, provides a significantly improved fundamental operation for highly scalable genomic data analysis. AVAILABILITY AND IMPLEMENTATION: An implementation of the AIList data structure with both construction and search algorithms is available at http://ailist.databio.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Genómica , Programas Informáticos , Algoritmos , GenomaRESUMEN
Motivation: Massively parallel capture of short tandem repeats (STRs, or microsatellites) provides a strategy for population genomic and demographic analyses at high resolution with or without a reference genome. However, the high Polymerase Chain Reaction (PCR) cycle numbers needed for target capture experiments create genotyping noise through polymerase slippage known as PCR stutter. Results: We developed SONiCS-Stutter mONte Carlo Simulation-a solution for stutter correction based on dense forward simulations of PCR and capture experimental conditions. To test SONiCS, we genotyped a 2499-marker STR panel in 22 humpback dolphins (Sousa sahulensis) using target capture, and generated capillary-based genotypes to validate five of these markers. In these 110 comparisons, SONiCS showed a 99.1% accuracy rate and a 98.2% genotyping success rate, miscalling a single allele in a marker with low sequence coverage and rejecting another as un-callable. Availability and implementation: Source code and documentation for SONiCS is freely available at https://github.com/kzkedzierska/sonics. Raw read data used in experimental validation of SONiCS have been deposited in the Sequence Read Archive under accession number SRP135756. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Técnicas de Genotipaje , Repeticiones de Microsatélite , Reacción en Cadena de la Polimerasa , Programas Informáticos , Alelos , Animales , Biología Computacional , Método de MontecarloRESUMEN
Short tandem repeat (STR) variants are highly polymorphic markers that facilitate powerful population genetic analyses. STRs are especially valuable in conservation and ecological genetic research, yielding detailed information on population structure and short-term demographic fluctuations. Massively parallel sequencing has not previously been leveraged for scalable, efficient STR recovery. Here, we present a pipeline for developing STR markers directly from high-throughput shotgun sequencing data without a reference genome, and an approach for highly parallel target STR recovery. We employed our approach to capture a panel of 5000 STRs from a test group of diademed sifakas (Propithecus diadema, n = 3), endangered Malagasy rainforest lemurs, and we report extremely efficient recovery of targeted loci-97.3-99.6% of STRs characterized with ≥10x non-redundant sequence coverage. We then tested our STR capture strategy on P. diadema fecal DNA, and report robust initial results and suggestions for future implementations. In addition to STR targets, this approach also generates large, genome-wide single nucleotide polymorphism (SNP) panels from flanking regions. Our method provides a cost-effective and scalable solution for rapid recovery of large STR and SNP datasets in any species without needing a reference genome, and can be used even with suboptimal DNA more easily acquired in conservation and ecological studies.
Asunto(s)
Marcadores Genéticos , Técnicas de Genotipaje/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Repeticiones de Microsatélite , Strepsirhini/genética , Animales , Secuencia de Bases , Especies en Peligro de Extinción , Genética de Población/métodos , Genoma Humano , Técnicas de Genotipaje/veterinaria , Secuenciación de Nucleótidos de Alto Rendimiento/veterinaria , Humanos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/veterinariaRESUMEN
BACKGROUND: The cellular effects of androgen are transduced through the androgen receptor, which controls the expression of genes that regulate biosynthetic processes, cell growth, and metabolism. Androgen signaling also impacts DNA damage signaling through mechanisms involving gene expression and transcription-associated DNA damaging events. Defining the contributions of androgen signaling to DNA repair is important for understanding androgen receptor function, and it also has translational implications. METHODS: We generated RNA-seq data from multiple prostate cancer lines and used bioinformatic analyses to characterize androgen-regulated gene expression. We compared the results from cell lines with gene expression data from prostate cancer xenografts, and patient samples, to query how androgen signaling and prostate cancer progression influences the expression of DNA repair genes. We performed whole genome sequencing to help characterize the status of the DNA repair machinery in widely used prostate cancer lines. Finally, we tested a DNA repair enzyme inhibitor for effects on androgen-dependent transcription. RESULTS: Our data indicates that androgen signaling regulates a subset of DNA repair genes that are largely specific to the respective model system and disease state. We identified deleterious mutations in the DNA repair genes RAD50 and CHEK2. We found that inhibition of the DNA repair enzyme MRE11 with the small molecule mirin inhibits androgen-dependent transcription and growth of prostate cancer cells. CONCLUSIONS: Our data supports the view that crosstalk between androgen signaling and DNA repair occurs at multiple levels, and that DNA repair enzymes in addition to PARPs, could be actionable targets in prostate cancer.
Asunto(s)
Andrógenos/metabolismo , Reparación del ADN/genética , ADN de Neoplasias/genética , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/metabolismo , Receptores Androgénicos/metabolismo , Animales , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Humanos , Masculino , Células PC-3 , Inhibidores de Proteínas Quinasas/farmacología , Transducción de Señal/efectos de los fármacos , Transcripción Genética/efectos de los fármacosRESUMEN
The Hawaiian mints (Lamiaceae), one of the largest endemic plant lineages in the archipelago, provide an excellent system to study rapid diversification of a lineage with a remote, likely paleohybrid origin. Since their divergence from New World mints 4-5 million years ago the members of this lineage have diversified greatly and represent a remarkable array of vegetative and reproductive phenotypes. Today many members of this group are endangered or already extinct, and molecular phylogenetic work relies largely on herbarium samples collected during the last century. So far a gene-by-gene approach has been utilized, but the recent radiation of the Hawaiian mints has resulted in minimal sequence divergence and hence poor phylogenetic resolution. In our quest to trace the reticulate evolutionary history of the lineage, a resolved maternal phylogeny is necessary. We applied a high-throughput approach to sequence 12 complete or nearly complete plastid genomes from multiple Hawaiian mint species and relatives, including extinct and rare taxa. We also targeted 108 hypervariable regions from throughout the chloroplast genomes in nearly all of the remaining Hawaiian species, and relatives, using a next-generation amplicon sequencing approach. This procedure generated â¼20Kb of sequence data for each taxon and considerably increased the total number of variable sites over previous analyses. Our results demonstrate the potential of high-throughput sequencing of historic material for evolutionary studies in rapidly evolving lineages. Our study, however, also highlights the challenges of resolving relationships within recent radiations even at the genomic level.
Asunto(s)
Especies en Peligro de Extinción , Extinción Biológica , Genoma de Plastidios , Mentha/genética , Filogenia , Plastidios/genética , Emparejamiento Base/genética , Secuencia de Bases , Daño del ADN , Hawaii , Análisis de Secuencia de ADNRESUMEN
In Europe, the Ixodes ricinus tick is the most important vector of the etiological agents of Lyme borreliosis and several other emerging tick-borne diseases. Because tick-borne pathogens are dependent on their vectors for transmission, understanding the vector population structure is crucial to inform public health research of pathogen dynamics and spread. However, the population structure and dynamics of this important vector species are not well understood as most genetic studies utilize short mitochondrial and nuclear sequences with little diversity. Herein we obtained and analyzed complete mitochondrial genome (hereafter "mitogenome") sequences to better understand the genetic diversity and the population structure of I. ricinus from two long-standing tick-borne disease foci in northern Italy. Complete mitogenomes of 23 I. ricinus ticks were sequenced at high coverage. Out of 23 mitogenome sequences we identified 17 unique haplotypes composed of 244 segregating sites. Phylogenetic reconstruction using 18 complete mitogenome sequences revealed the coexistence of four highly divergent I. ricinus maternal lineages despite the narrow spatial scale over which these samples were obtained (100km). Notably, the estimated coalescence time of the 18 mitogenome haplotypes is â¼427 thousand years ago (95% HPD 330, 540). This divergence between I. ricinus lineages is consistent with the mitochondrial diversity of other arthropod vector species and indicates that long-term I. ricinus populations may have been less structured and larger than previously thought. Thus, this study suggests that a rapid and accurate retrieval of full mitochondrial genomes from this disease vector enables fine-resolution studies of tick intraspecies genetic relationships, population differentiation, and demographic history.
Asunto(s)
Genoma Mitocondrial , Ixodes/clasificación , Animales , ADN/química , ADN/aislamiento & purificación , ADN/metabolismo , Variación Genética , Insectos Vectores/microbiología , Italia , Ixodes/genética , Enfermedad de Lyme/microbiología , Enfermedad de Lyme/patología , Filogenia , Análisis de Secuencia de ADNRESUMEN
The genetic structure of the indigenous hunter-gatherer peoples of southern Africa, the oldest known lineage of modern human, is important for understanding human diversity. Studies based on mitochondrial and small sets of nuclear markers have shown that these hunter-gatherers, known as Khoisan, San, or Bushmen, are genetically divergent from other humans. However, until now, fully sequenced human genomes have been limited to recently diverged populations. Here we present the complete genome sequences of an indigenous hunter-gatherer from the Kalahari Desert and a Bantu from southern Africa, as well as protein-coding regions from an additional three hunter-gatherers from disparate regions of the Kalahari. We characterize the extent of whole-genome and exome diversity among the five men, reporting 1.3 million novel DNA differences genome-wide, including 13,146 novel amino acid variants. In terms of nucleotide substitutions, the Bushmen seem to be, on average, more different from each other than, for example, a European and an Asian. Observed genomic differences between the hunter-gatherers and others may help to pinpoint genetic adaptations to an agricultural lifestyle. Adding the described variants to current databases will facilitate inclusion of southern Africans in medical research efforts, particularly when family and medical histories can be correlated with genome-wide data.
Asunto(s)
Población Negra/genética , Etnicidad/genética , Genoma Humano/genética , Pueblo Asiatico/genética , Exones/genética , Genética Médica , Humanos , Filogenia , Polimorfismo de Nucleótido Simple/genética , Sudáfrica/etnología , Población Blanca/genéticaRESUMEN
We performed a population genomics study of the aye-aye, a highly specialized nocturnal lemur from Madagascar. Aye-ayes have low population densities and extensive range requirements that could make this flagship species particularly susceptible to extinction. Therefore, knowledge of genetic diversity and differentiation among aye-aye populations is critical for conservation planning. Such information may also advance our general understanding of Malagasy biogeography, as aye-ayes have the largest species distribution of any lemur. We generated and analyzed whole-genome sequence data for 12 aye-ayes from three regions of Madagascar (North, West, and East). We found that the North population is genetically distinct, with strong differentiation from other aye-ayes over relatively short geographic distances. For comparison, the average FST value between the North and East aye-aye populations--separated by only 248 km--is over 2.1-times greater than that observed between human Africans and Europeans. This finding is consistent with prior watershed- and climate-based hypotheses of a center of endemism in northern Madagascar. Taken together, these results suggest a strong and long-term biogeographical barrier to gene flow. Thus, the specific attention that should be directed toward preserving large, contiguous aye-aye habitats in northern Madagascar may also benefit the conservation of other distinct taxonomic units. To help facilitate future ecological- and conservation-motivated population genomic analyses by noncomputational biologists, the analytical toolkit used in this study is available on the Galaxy Web site.
Asunto(s)
Genética de Población , Genómica , Lemur/genética , Lemur/fisiología , Animales , Evolución Molecular , Genoma , Genotipo , Geografía , Internet , Madagascar , Filogenia , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN , Factores de TiempoRESUMEN
BACKGROUND: The discovery and mapping of genomic variants is an essential step in most analysis done using sequencing reads. There are a number of mature software packages and associated pipelines that can identify single nucleotide polymorphisms (SNPs) with a high degree of concordance. However, the same cannot be said for tools that are used to identify the other types of variants. Indels represent the second most frequent class of variants in the human genome, after single nucleotide polymorphisms. The reliable detection of indels is still a challenging problem, especially for variants that are longer than a few bases. RESULTS: We have developed a set of algorithms and heuristics collectively called indelMINER to identify indels from whole genome resequencing datasets using paired-end reads. indelMINER uses a split-read approach to identify the precise breakpoints for indels of size less than a user specified threshold, and supplements that with a paired-end approach to identify larger variants that are frequently missed with the split-read approach. We use simulated and real datasets to show that an implementation of the algorithm performs favorably when compared to several existing tools. CONCLUSIONS: indelMINER can be used effectively to identify indels in whole-genome resequencing projects. The output is provided in the VCF format along with additional information about the variant, including information about its presence or absence in another sample. The source code and documentation for indelMINER can be freely downloaded from www.bx.psu.edu/miller_lab/indelMINER.tar.gz .
Asunto(s)
Algoritmos , Biomarcadores de Tumor/genética , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Mutación INDEL/genética , Neoplasias/genética , Análisis de Secuencia de ADN/métodos , Estudios de Casos y Controles , Genómica/métodos , Humanos , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
BACKGROUND: With the development of inexpensive, high-throughput sequencing technologies, it has become feasible to examine questions related to population genetics and molecular evolution of non-model species in their ecological contexts on a genome-wide scale. Here, we employed a newly developed suite of integrated, web-based programs to examine population dynamics and signatures of selection across the genome using several well-established tests, including F ST, pN/pS, and McDonald-Kreitman. We applied these techniques to study populations of honey bees (Apis mellifera) in East Africa. In Kenya, there are several described A. mellifera subspecies, which are thought to be localized to distinct ecological regions. RESULTS: We performed whole genome sequencing of 11 worker honey bees from apiaries distributed throughout Kenya and identified 3.6 million putative single-nucleotide polymorphisms. The dense coverage allowed us to apply several computational procedures to study population structure and the evolutionary relationships among the populations, and to detect signs of adaptive evolution across the genome. While there is considerable gene flow among the sampled populations, there are clear distinctions between populations from the northern desert region and those from the temperate, savannah region. We identified several genes showing population genetic patterns consistent with positive selection within African bee populations, and between these populations and European A. mellifera or Asian Apis florea. CONCLUSIONS: These results lay the groundwork for future studies of adaptive ecological evolution in honey bees, and demonstrate the use of new, freely available web-based tools and workflows ( http://usegalaxy.org/r/kenyanbee ) that can be applied to any model system with genomic information.
Asunto(s)
Abejas/genética , Genoma de los Insectos/genética , Selección Genética/genética , Transcriptoma/genética , Animales , Evolución Molecular , Genética de Población/métodos , Genómica/métodos , Kenia , Modelos Genéticos , Polimorfismo de Nucleótido Simple/genética , Dinámica PoblacionalRESUMEN
Humans first arrived on Madagascar only a few thousand years ago. Subsequent habitat destruction and hunting activities have had significant impacts on the island's biodiversity, including the extinction of megafauna. For example, we know of 17 recently extinct 'subfossil' lemur species, all of which were substantially larger (body mass â¼11-160 kg) than any living population of the â¼100 extant lemur species (largest body mass â¼6.8 kg). We used ancient DNA and genomic methods to study subfossil lemur extinction biology and update our understanding of extant lemur conservation risk factors by i) reconstructing a comprehensive phylogeny of extinct and extant lemurs, and ii) testing whether low genetic diversity is associated with body size and extinction risk. We recovered complete or near-complete mitochondrial genomes from five subfossil lemur taxa, and generated sequence data from population samples of two extinct and eight extant lemur species. Phylogenetic comparisons resolved prior taxonomic uncertainties and confirmed that the extinct subfossil species did not comprise a single clade. Genetic diversity estimates for the two sampled extinct species were relatively low, suggesting small historical population sizes. Low genetic diversity and small population sizes are both risk factors that would have rendered giant lemurs especially susceptible to extinction. Surprisingly, among the extant lemurs, we did not observe a relationship between body size and genetic diversity. The decoupling of these variables suggests that risk factors other than body size may have as much or more meaning for establishing future lemur conservation priorities.
Asunto(s)
Tamaño Corporal , Extinción Biológica , Genómica/métodos , Lemur , Paleontología/métodos , Animales , Tamaño Corporal/genética , Tamaño Corporal/fisiología , ADN/análisis , ADN/genética , Fósiles , Lemur/clasificación , Lemur/genética , Lemur/fisiología , Madagascar , FilogeniaRESUMEN
Polar bears (PBs) are superbly adapted to the extreme Arctic environment and have become emblematic of the threat to biodiversity from global climate change. Their divergence from the lower-latitude brown bear provides a textbook example of rapid evolution of distinct phenotypes. However, limited mitochondrial and nuclear DNA evidence conflicts in the timing of PB origin as well as placement of the species within versus sister to the brown bear lineage. We gathered extensive genomic sequence data from contemporary polar, brown, and American black bear samples, in addition to a 130,000- to 110,000-y old PB, to examine this problem from a genome-wide perspective. Nuclear DNA markers reflect a species tree consistent with expectation, showing polar and brown bears to be sister species. However, for the enigmatic brown bears native to Alaska's Alexander Archipelago, we estimate that not only their mitochondrial genome, but also 5-10% of their nuclear genome, is most closely related to PBs, indicating ancient admixture between the two species. Explicit admixture analyses are consistent with ancient splits among PBs, brown bears and black bears that were later followed by occasional admixture. We also provide paleodemographic estimates that suggest bear evolution has tracked key climate events, and that PB in particular experienced a prolonged and dramatic decline in its effective population size during the last ca. 500,000 years. We demonstrate that brown bears and PBs have had sufficiently independent evolutionary histories over the last 4-5 million years to leave imprints in the PB nuclear genome that likely are associated with ecological adaptation to the Arctic environment.