RESUMEN
As recently demonstrated by the COVID-19 pandemic, large-scale pathogen genomic data are crucial to characterize transmission patterns of human infectious diseases. Yet, current methods to process raw sequence data into analysis-ready variants remain slow to scale, hampering rapid surveillance efforts and epidemiological investigations for disease control. Here, we introduce an accelerated, scalable, reproducible, and cost-effective framework for pathogen genomic variant identification and present an evaluation of its performance and accuracy across benchmark datasets of Plasmodium falciparum malaria genomes. We demonstrate superior performance of the GPU framework relative to standard pipelines with mean execution time and computational costs reduced by 27× and 4.6×, respectively, while delivering 99.9% accuracy at enhanced reproducibility.
Asunto(s)
COVID-19 , Enfermedades Transmisibles , Malaria , COVID-19/epidemiología , COVID-19/genética , Genómica/métodos , Humanos , Pandemias , Reproducibilidad de los ResultadosRESUMEN
Detection of de novo variants (DNVs) is critical for studies of disease-related variation and mutation rates. To accelerate DNV calling, we developed a graphics processing units-based workflow. We applied our workflow to whole-genome sequencing data from three parent-child sequenced cohorts including the Simons Simplex Collection (SSC), Simons Foundation Powering Autism Research (SPARK), and the 1000 Genomes Project (1000G) that were sequenced using DNA from blood, saliva, and lymphoblastoid cell lines (LCLs), respectively. The SSC and SPARK DNV callsets were within expectations for number of DNVs, percent at CpG sites, phasing to the paternal chromosome of origin, and average allele balance. However, the 1000G DNV callset was not within expectations and contained excessive DNVs that are likely cell line artifacts. Mutation signature analysis revealed 30% of 1000G DNV signatures matched B-cell lymphoma. Furthermore, we found variants in DNA repair genes and at Clinvar pathogenic or likely-pathogenic sites and significant excess of protein-coding DNVs in IGLL5; a gene known to be involved in B-cell lymphomas. Our study provides a new rapid DNV caller for the field and elucidates important implications of using sequencing data from LCLs for reference building and disease-related projects.
Asunto(s)
Neoplasias , Humanos , Alelos , Mutación , Neoplasias/genética , Secuenciación Completa del GenomaRESUMEN
Harbor porpoise in the North Pacific are found in coastal waters from southern California to Japan, but population structure is poorly known outside of a few local areas. We used multiplexed amplicon sequencing of 292 loci and genotyped clusters of single nucleotide polymoirphisms as microhaplotypes (N = 271 samples) in addition to mitochondrial (mtDNA) sequence data (N = 413 samples) to examine the genetic structure from samples collected along the Pacific coast and inland waterways from California to southern British Columbia. We confirmed an overall pattern of strong isolation-by-distance, suggesting that individual dispersal is restricted. We also found evidence of regions where genetic differences are larger than expected based on geographical distance alone, implying current or historical barriers to gene flow. In particular, the southernmost population in California is genetically distinct (FST = 0.02 [microhaplotypes]; 0.31 [mtDNA]), with both reduced genetic variability and high frequency of an otherwise rare mtDNA haplotype. At the northern end of our study range, we found significant genetic differentiation of samples from the Strait of Georgia, previously identified as a potential biogeographical boundary or secondary contact zone between harbor porpoise populations. Association of microhaplotypes with remotely sensed environmental variables indicated potential local adaptation, especially at the southern end of the species' range. These results inform conservation and management for this nearshore species, illustrate the value of genomic methods for detecting patterns of genetic structure within a continuously distributed marine species, and highlight the power of microhaplotype genotyping for detecting genetic structure in harbor porpoises despite reliance on poor-quality samples.
Asunto(s)
Phocoena , Animales , Colombia Británica , ADN Mitocondrial/genética , Flujo Génico , Variación Genética , Genética de Población , Georgia , Japón , Phocoena/genéticaRESUMEN
Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR) and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers.
Asunto(s)
Genética de Población , Genómica , Metabolismo de los Lípidos/genética , Selección Genética , Animales , Demografía , Perros , Genoma , Polimorfismo de Nucleótido SimpleRESUMEN
Primary triple-negative breast cancers (TNBCs), a tumour type defined by lack of oestrogen receptor, progesterone receptor and ERBB2 gene amplification, represent approximately 16% of all breast cancers. Here we show in 104 TNBC cases that at the time of diagnosis these cancers exhibit a wide and continuous spectrum of genomic evolution, with some having only a handful of coding somatic aberrations in a few pathways, whereas others contain hundreds of coding somatic mutations. High-throughput RNA sequencing (RNA-seq) revealed that only approximately 36% of mutations are expressed. Using deep re-sequencing measurements of allelic abundance for 2,414 somatic mutations, we determine for the first time-to our knowledge-in an epithelial tumour subtype, the relative abundance of clonal frequencies among cases representative of the population. We show that TNBCs vary widely in their clonal frequencies at the time of diagnosis, with the basal subtype of TNBC showing more variation than non-basal TNBC. Although p53 (also known as TP53), PIK3CA and PTEN somatic mutations seem to be clonally dominant compared to other genes, in some tumours their clonal frequencies are incompatible with founder status. Mutations in cytoskeletal, cell shape and motility proteins occurred at lower clonal frequencies, suggesting that they occurred later during tumour progression. Taken together, our results show that understanding the biology and therapeutic responses of patients with TNBC will require the determination of individual tumour clonal genotypes.
Asunto(s)
Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Evolución Molecular , Mutación/genética , Alelos , Neoplasias de la Mama/diagnóstico , Células Clonales/metabolismo , Células Clonales/patología , Variaciones en el Número de Copia de ADN/genética , Análisis Mutacional de ADN , Progresión de la Enfermedad , Femenino , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica/genética , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL/genética , Mutación Puntual/genética , Medicina de Precisión , Reproducibilidad de los Resultados , Análisis de Secuencia de ARNRESUMEN
OBJECTIVES: The objectives of this study were to assess if targeted investigation for tumor-specific mutations by ultradeep DNA sequencing of peritoneal washes of ovarian cancer patients after primary surgical debulking and chemotherapy, and clinically diagnosed as disease free, provides a more sensitive and specific method to assess actual treatment response and tailor future therapy and to compare this "molecular second look" with conventional cytology and histopathology-based findings. METHODS/MATERIALS: We identified 10 patients with advanced-stage, high-grade serous ovarian cancer who had undergone second-look laparoscopy and for whom DNA could be isolated from biobanked paired blood, primary and recurrent tumor, and second-look peritoneal washes. A targeted 56 gene cancer-relevant panel was used for next-generation sequencing (average coverage, >6500×). Mutations were validated using either digital droplet polymerase chain reaction (ddPCR) or Sanger sequencing. RESULTS: A total of 25 tumor-specific mutations were identified (median, 2/patient; range, 1-8). TP53 mutations were identified in at least 1 sample from all patients. All 5 pathology-based second-look positive patients were confirmed positive by molecular second look. Genetic analysis revealed that 3 of the 5 pathology-based negative second looks were actually positive. In the 2 patients, the second-look mutations were present in either the original primary or recurrent tumors. In the third, 2 high-frequency, novel frameshift mutations in MSH6 and HNF1A were identified. CONCLUSIONS: The molecular second look detects tumor-specific evidence of residual disease and provides genetic insight into tumor evolution and future recurrences beyond standard pathology. In the precision medicine era, detecting and genetically characterizing residual disease after standard treatment will be invaluable for improving patient outcomes.
Asunto(s)
Cistadenocarcinoma Seroso/genética , Neoplasias Ováricas/genética , Anciano , Alelos , Cistadenocarcinoma Seroso/patología , Análisis Mutacional de ADN , ADN de Neoplasias/genética , ADN de Neoplasias/aislamiento & purificación , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Persona de Mediana Edad , Mutación , Neoplasias Ováricas/patología , Medicina de Precisión/métodos , Prueba de Estudio ConceptualRESUMEN
Intra-tumor heterogeneity is a hallmark of many cancers and may lead to therapy resistance or interfere with personalized treatment strategies. Here, we combined topographic mapping of somatic breakpoints and transcriptional profiling to probe intra-tumor heterogeneity of treatment-naïve stage IIIC/IV epithelial ovarian cancer. We observed that most substantial differences in genomic rearrangement landscapes occurred between metastases in the omentum and peritoneum versus tumor sites in the ovaries. Several cancer genes such as NF1, CDKN2A, and FANCD2 were affected by lesion-specific breakpoints. Furthermore, the intra-tumor variability involved different mutational hallmarks including lesion-specific kataegis (local mutation shower coinciding with genomic breakpoints), rearrangement classes, and coding mutations. In one extreme case, we identified two independent TP53 mutations in ovary tumors and omentum/peritoneum metastases, respectively. Examination of gene expression dynamics revealed up-regulation of key cancer pathways including WNT, integrin, chemokine, and Hedgehog signaling in only subsets of tumor samples from the same patient. Finally, we took advantage of the multilevel tumor analysis to understand the effects of genomic breakpoints on qualitative and quantitative gene expression changes. We show that intra-tumor gene expression differences are caused by site-specific genomic alterations, including formation of in-frame fusion genes. These data highlight the plasticity of ovarian cancer genomes, which may contribute to their strong capacity to adapt to changing environmental conditions and give rise to the high rate of recurrent disease following standard treatment regimes.
Asunto(s)
Aberraciones Cromosómicas , Regulación Neoplásica de la Expresión Génica , Genoma Humano , Neoplasias Ováricas/genética , Anciano , Inhibidor p16 de la Quinasa Dependiente de Ciclina/genética , Proteína del Grupo de Complementación D2 de la Anemia de Fanconi/genética , Femenino , Perfilación de la Expresión Génica , Humanos , Persona de Mediana Edad , Metástasis de la Neoplasia , Estadificación de Neoplasias , Neurofibromatosis 1/genética , Epiplón/metabolismo , Epiplón/patología , Proteínas de Fusión Oncogénica/genética , Neoplasias Ováricas/patología , Peritoneo/metabolismo , Peritoneo/patología , Proteína p53 Supresora de Tumor/genéticaRESUMEN
The somatic mutation burden in healthy white blood cells (WBCs) is not well known. Based on deep whole-genome sequencing, we estimate that approximately 450 somatic mutations accumulated in the nonrepetitive genome within the healthy blood compartment of a 115-yr-old woman. The detected mutations appear to have been harmless passenger mutations: They were enriched in noncoding, AT-rich regions that are not evolutionarily conserved, and they were depleted for genomic elements where mutations might have favorable or adverse effects on cellular fitness, such as regions with actively transcribed genes. The distribution of variant allele frequencies of these mutations suggests that the majority of the peripheral white blood cells were offspring of two related hematopoietic stem cell (HSC) clones. Moreover, telomere lengths of the WBCs were significantly shorter than telomere lengths from other tissues. Together, this suggests that the finite lifespan of HSCs, rather than somatic mutation effects, may lead to hematopoietic clonal evolution at extreme ages.
Asunto(s)
Evolución Clonal , Hematopoyesis , Leucocitos/metabolismo , Longevidad/genética , Mutación , Secuencia Rica en At , Anciano de 80 o más Años , Linaje de la Célula , Secuencia Conservada , Femenino , Frecuencia de los Genes , Genoma , Células Madre Hematopoyéticas/citología , Células Madre Hematopoyéticas/metabolismo , Células Madre Hematopoyéticas/fisiología , Humanos , Leucocitos/citología , Leucocitos/fisiología , Telómero/genética , Acortamiento del TelómeroRESUMEN
Genome sequencing of the 5,300-year-old mummy of the Tyrolean Iceman, found in 1991 on a glacier near the border of Italy and Austria, has yielded new insights into his origin and relationship to modern European populations. A key finding of that study was an apparent recent common ancestry with individuals from Sardinia, based largely on the Y chromosome haplogroup and common autosomal SNP variation. Here, we compiled and analyzed genomic datasets from both modern and ancient Europeans, including genome sequence data from over 400 Sardinians and two ancient Thracians from Bulgaria, to investigate this result in greater detail and determine its implications for the genetic structure of Neolithic Europe. Using whole-genome sequencing data, we confirm that the Iceman is, indeed, most closely related to Sardinians. Furthermore, we show that this relationship extends to other individuals from cultural contexts associated with the spread of agriculture during the Neolithic transition, in contrast to individuals from a hunter-gatherer context. We hypothesize that this genetic affinity of ancient samples from different parts of Europe with Sardinians represents a common genetic component that was geographically widespread across Europe during the Neolithic, likely related to migrations and population expansions associated with the spread of agriculture.
Asunto(s)
Fósiles , Genética de Población , Genoma Humano , Europa (Continente) , Femenino , Humanos , Polimorfismo de Nucleótido SimpleRESUMEN
To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11-16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B) in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog origins is necessary.
Asunto(s)
Amilasas/genética , Animales Domésticos/genética , Variaciones en el Número de Copia de ADN/genética , Evolución Molecular , Animales , ADN Mitocondrial/genética , Dieta , Perros , Variación Genética , Filogenia , Densidad de Población , Lobos/clasificación , Lobos/genéticaRESUMEN
BACKGROUND: Endometrial cancer is the most common gynecologic malignancy, and its incidence and associated mortality are increasing. Despite the immediate need to detect these cancers at an earlier stage, there is no effective screening methodology or protocol for endometrial cancer. The comprehensive, genomics-based analysis of endometrial cancer by The Cancer Genome Atlas (TCGA) revealed many of the molecular defects that define this cancer. Based on these cancer genome results, and in a prospective study, we hypothesized that the use of ultra-deep, targeted gene sequencing could detect somatic mutations in uterine lavage fluid obtained from women undergoing hysteroscopy as a means of molecular screening and diagnosis. METHODS AND FINDINGS: Uterine lavage and paired blood samples were collected and analyzed from 107 consecutive patients who were undergoing hysteroscopy and curettage for diagnostic evaluation from this single-institution study. The lavage fluid was separated into cellular and acellular fractions by centrifugation. Cellular and cell-free DNA (cfDNA) were isolated from each lavage. Two targeted next-generation sequencing (NGS) gene panels, one composed of 56 genes and the other of 12 genes, were used for ultra-deep sequencing. To rule out potential NGS-based errors, orthogonal mutation validation was performed using digital PCR and Sanger sequencing. Seven patients were diagnosed with endometrial cancer based on classic histopathologic analysis. Six of these patients had stage IA cancer, and one of these cancers was only detectable as a microscopic focus within a polyp. All seven patients were found to have significant cancer-associated gene mutations in both cell pellet and cfDNA fractions. In the four patients in whom adequate tumor sample was available, all tumor mutations above a specific allele fraction were present in the uterine lavage DNA samples. Mutations originally only detected in lavage fluid fractions were later confirmed to be present in tumor but at allele fractions significantly less than 1%. Of the remaining 95 patients diagnosed with benign or non-cancer pathology, 44 had no significant cancer mutations detected. Intriguingly, 51 patients without histopathologic evidence of cancer had relatively high allele fraction (1.0%-30.4%), cancer-associated mutations. Participants with detected driver and potential driver mutations were significantly older (mean age mutated = 57.96, 95% confidence interval [CI]: 3.30-∞, mean age no mutations = 50.35; p-value = 0.002; Benjamini-Hochberg [BH] adjusted p-value = 0.015) and more likely to be post-menopausal (p-value = 0.004; BH-adjusted p-value = 0.015) than those without these mutations. No associations were detected between mutation status and race/ethnicity, body mass index, diabetes, parity, and smoking status. Long-term follow-up was not presently available in this prospective study for those women without histopathologic evidence of cancer. CONCLUSIONS: Using ultra-deep NGS, we identified somatic mutations in DNA extracted both from cell pellets and a never previously reported cfDNA fraction from the uterine lavage. Using our targeted sequencing approach, endometrial driver mutations were identified in all seven women who received a cancer diagnosis based on classic histopathology of tissue curettage obtained at the time of hysteroscopy. In addition, relatively high allele fraction driver mutations were identified in the lavage fluid of approximately half of the women without a cancer diagnosis. Increasing age and post-menopausal status were associated with the presence of these cancer-associated mutations, suggesting the prevalent existence of a premalignant landscape in women without clinical evidence of cancer. Given that a uterine lavage can be easily and quickly performed even outside of the operating room and in a physician's office-based setting, our findings suggest the future possibility of this approach for screening women for the earliest stages of endometrial cancer. However, our findings suggest that further insight into development of cancer or its interruption are needed before translation to the clinic.
Asunto(s)
ADN de Neoplasias , Neoplasias Endometriales/genética , Genoma , Mutación , Útero/metabolismo , Adulto , Anciano , Anciano de 80 o más Años , Estudios Transversales , Neoplasias Endometriales/patología , Femenino , Humanos , Persona de Mediana Edad , Estudios Prospectivos , Irrigación TerapéuticaRESUMEN
The genetic structure of the indigenous hunter-gatherer peoples of southern Africa, the oldest known lineage of modern human, is important for understanding human diversity. Studies based on mitochondrial and small sets of nuclear markers have shown that these hunter-gatherers, known as Khoisan, San, or Bushmen, are genetically divergent from other humans. However, until now, fully sequenced human genomes have been limited to recently diverged populations. Here we present the complete genome sequences of an indigenous hunter-gatherer from the Kalahari Desert and a Bantu from southern Africa, as well as protein-coding regions from an additional three hunter-gatherers from disparate regions of the Kalahari. We characterize the extent of whole-genome and exome diversity among the five men, reporting 1.3 million novel DNA differences genome-wide, including 13,146 novel amino acid variants. In terms of nucleotide substitutions, the Bushmen seem to be, on average, more different from each other than, for example, a European and an Asian. Observed genomic differences between the hunter-gatherers and others may help to pinpoint genetic adaptations to an agricultural lifestyle. Adding the described variants to current databases will facilitate inclusion of southern Africans in medical research efforts, particularly when family and medical histories can be correlated with genome-wide data.
Asunto(s)
Población Negra/genética , Etnicidad/genética , Genoma Humano/genética , Pueblo Asiatico/genética , Exones/genética , Genética Médica , Humanos , Filogenia , Polimorfismo de Nucleótido Simple/genética , Sudáfrica/etnología , Población Blanca/genéticaRESUMEN
OBJECTIVE: Breath testing and duodenal culture studies suggest that a significant proportion of irritable bowel syndrome (IBS) patients have small intestinal bacterial overgrowth. In this study, we extended these data through 16S rDNA amplicon sequencing and quantitative PCR (qPCR) analyses of duodenal aspirates from a large cohort of IBS, non-IBS and control subjects. MATERIALS AND METHODS: Consecutive subjects presenting for esophagogastroduodenoscopy only and healthy controls were recruited. Exclusion criteria included recent antibiotic or probiotic use. Following extensive medical work-up, patients were evaluated for symptoms of IBS. DNAs were isolated from duodenal aspirates obtained during endoscopy. Microbial populations in a subset of IBS subjects and controls were compared by 16S profiling. Duodenal microbes were then quantitated in the entire cohort by qPCR and the results compared with quantitative live culture data. RESULTS: A total of 258 subjects were recruited (21 healthy, 163 non-healthy non-IBS, and 74 IBS). 16S profiling in five IBS and five control subjects revealed significantly lower microbial diversity in the duodenum in IBS, with significant alterations in 12 genera (false discovery rate < 0.15), including overrepresentation of Escherichia/Shigella (p = 0.005) and Aeromonas (p = 0.051) and underrepresentation of Acinetobacter (p = 0.024), Citrobacter (p = 0.031) and Microvirgula (p = 0.036). qPCR in all 258 subjects confirmed greater levels of Escherichia coli in IBS and also revealed increases in Klebsiella spp, which correlated strongly with quantitative culture data. CONCLUSIONS: 16S rDNA sequencing confirms microbial overgrowth in the small bowel in IBS, with a concomitant reduction in diversity. qPCR supports alterations in specific microbial populations in IBS.
Asunto(s)
ADN Bacteriano/análisis , ADN Bacteriano/aislamiento & purificación , Duodeno/microbiología , Heces/microbiología , Microbioma Gastrointestinal/genética , Síndrome del Colon Irritable/microbiología , Adulto , Anciano , Anciano de 80 o más Años , Estudios de Casos y Controles , Endoscopía Gastrointestinal , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estudios Prospectivos , Reacción en Cadena en Tiempo Real de la PolimerasaRESUMEN
Using a combination of whole-genome resequencing and high-density genotyping arrays, genome-wide haplotypes were reconstructed for two of the most important bulls in the history of the dairy cattle industry, Pawnee Farm Arlinda Chief ("Chief") and his son Walkway Chief Mark ("Mark"), each accounting for â¼7% of all current genomes. We aligned 20.5 Gbp (â¼7.3× coverage) and 37.9 Gbp (â¼13.5× coverage) of the Chief and Mark genomic sequences, respectively. More than 1.3 million high-quality SNPs were detected in Chief and Mark sequences. The genome-wide haplotypes inherited by Mark from Chief were reconstructed using â¼1 million informative SNPs. Comparison of a set of 15,826 SNPs that overlapped in the sequence-based and BovineSNP50 SNPs showed the accuracy of the sequence-based haplotype reconstruction to be as high as 97%. By using the BovineSNP50 genotypes, the frequencies of Chief alleles on his two haplotypes then were determined in 1,149 of his descendants, and the distribution was compared with the frequencies that would be expected assuming no selection. We identified 49 chromosomal segments in which Chief alleles showed strong evidence of selection. Candidate polymorphisms for traits that have been under selection in the dairy cattle population then were identified by referencing Chief's DNA sequence within these selected chromosome blocks. Eleven candidate genes were identified with functions related to milk-production, fertility, and disease-resistance traits. These data demonstrate that haplotype reconstruction of an ancestral proband by whole-genome resequencing in combination with high-density SNP genotyping of descendants can be used for rapid, genome-wide identification of the ancestor's alleles that have been subjected to artificial selection.
Asunto(s)
Cruzamiento/métodos , Bovinos/genética , Genoma/genética , Haplotipos/genética , Selección Genética , Animales , Secuencia de Bases , Estudios de Asociación Genética/veterinaria , Genotipo , Masculino , Datos de Secuencia Molecular , Polimorfismo de Nucleótido Simple/genética , Análisis de Secuencia de ADNRESUMEN
BACKGROUND: Vibrio cholerae is a globally dispersed pathogen that has evolved with humans for centuries, but also includes non-pathogenic environmental strains. Here, we identify the genomic variability underlying this remarkable persistence across the three major niche dimensions space, time, and habitat. RESULTS: Taking an innovative approach of genome-wide association applicable to microbial genomes (GWAS-M), we classify 274 complete V. cholerae genomes by niche, including 39 newly sequenced for this study with the Ion Torrent DNA-sequencing platform. Niche metadata were collected for each strain and analyzed together with comprehensive annotations of genetic and genomic attributes, including point mutations (single-nucleotide polymorphisms, SNPs), protein families, functions and prophages. CONCLUSIONS: Our analysis revealed that genomic variations, in particular mobile functions including phages, prophages, transposable elements, and plasmids underlie the metadata structuring in each of the three niche dimensions. This underscores the role of phages and mobile elements as the most rapidly evolving elements in bacterial genomes, creating local endemicity (space), leading to temporal divergence (time), and allowing the invasion of new habitats. Together, we take a data-driven approach for comparative functional genomics that exploits high-volume genome sequencing and annotation, in conjunction with novel statistical and machine learning analyses to identify connections between genotype and phenotype on a genome-wide scale.
Asunto(s)
Genoma Bacteriano , Vibrio cholerae/genética , Cólera/epidemiología , Cólera/microbiología , Elementos Transponibles de ADN , Microbiología Ambiental , Evolución Molecular , Variación Genética , Genotipo , Humanos , Anotación de Secuencia Molecular , Filogenia , Filogeografía , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN , Vibrio cholerae/aislamiento & purificaciónRESUMEN
High-throughput sequencing of the taxonomically informative 16S rRNA gene provides a powerful approach for exploring microbial diversity. Here we compare the performances of two common "benchtop" sequencing platforms, Illumina MiSeq and Ion Torrent Personal Genome Machine (PGM), for bacterial community profiling by 16S rRNA (V1-V2) amplicon sequencing. We benchmarked performance by using a 20-organism mock bacterial community and a collection of primary human specimens. We observed comparatively higher error rates with the Ion Torrent platform and report a pattern of premature sequence truncation specific to semiconductor sequencing. Read truncation was dependent on both the directionality of sequencing and the target species, resulting in organism-specific biases in community profiles. We found that these sequencing artifacts could be minimized by using bidirectional amplicon sequencing and an optimized flow order on the Ion Torrent platform. Results of bacterial community profiling performed on the mock community and a collection of 18 human-derived microbiological specimens were generally in good agreement for both platforms; however, in some cases, results differed significantly. Disparities could be attributed to the failure to generate full-length reads for particular organisms on the Ion Torrent platform, organism-dependent differences in sequence error rates affecting classification of certain species, or some combination of these factors. This study demonstrates the potential for differential bias in bacterial community profiles resulting from the choice of sequencing platform alone.
Asunto(s)
Bacterias/aislamiento & purificación , Infecciones Bacterianas/microbiología , ADN Bacteriano/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARN Ribosómico 16S/genética , Bacterias/clasificación , Bacterias/genética , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , HumanosRESUMEN
In 1994, two independent groups extracted DNA from several Pleistocene epoch mammoths and noted differences among individual specimens. Subsequently, DNA sequences have been published for a number of extinct species. However, such ancient DNA is often fragmented and damaged, and studies to date have typically focused on short mitochondrial sequences, never yielding more than a fraction of a per cent of any nuclear genome. Here we describe 4.17 billion bases (Gb) of sequence from several mammoth specimens, 3.3 billion (80%) of which are from the woolly mammoth (Mammuthus primigenius) genome and thus comprise an extensive set of genome-wide sequence from an extinct species. Our data support earlier reports that elephantid genomes exceed 4 Gb. The estimated divergence rate between mammoth and African elephant is half of that between human and chimpanzee. The observed number of nucleotide differences between two particular mammoths was approximately one-eighth of that between one of them and the African elephant, corresponding to a separation between the mammoths of 1.5-2.0 Myr. The estimated probability that orthologous elephant and mammoth amino acids differ is 0.002, corresponding to about one residue per protein. Differences were discovered between mammoth and African elephant in amino-acid positions that are otherwise invariant over several billion years of combined mammalian evolution. This study shows that nuclear genome sequencing of extinct species can reveal population differences not evident from the fossil record, and perhaps even discover genetic factors that affect extinction.
Asunto(s)
Núcleo Celular/genética , Elefantes/genética , Evolución Molecular , Extinción Biológica , Fósiles , Genoma/genética , Genómica , Análisis de Secuencia de ADN/métodos , África , Animales , Secuencia Conservada/genética , Elefantes/anatomía & histología , Femenino , Cabello/metabolismo , Humanos , India , Masculino , FilogeniaRESUMEN
Anticancer drugs are effective against tumors that depend on the molecular target of the drug. Known targets of cytotoxic anticancer drugs are involved in cell proliferation; drugs acting on such targets are ineffective against nonproliferating tumor cells, survival of which leads to eventual therapy failure. Function-based genomic screening identified the coatomer protein complex ζ1 (COPZ1) gene as essential for different tumor cell types but not for normal cells. COPZ1 encodes a subunit of coatomer protein complex 1 (COPI) involved in intracellular traffic and autophagy. The knockdown of COPZ1, but not of COPZ2 encoding isoform coatomer protein complex ζ2, caused Golgi apparatus collapse, blocked autophagy, and induced apoptosis in both proliferating and nondividing tumor cells. In contrast, inhibition of normal cell growth required simultaneous knockdown of both COPZ1 and COPZ2. COPZ2 (but not COPZ1) was down-regulated in the majority of tumor cell lines and in clinical samples of different cancer types. Reexpression of COPZ2 protected tumor cells from killing by COPZ1 knockdown, indicating that tumor cell dependence on COPZ1 is the result of COPZ2 silencing. COPZ2 displays no tumor-suppressive activities, but it harbors microRNA 152, which is silenced in tumor cells concurrently with COPZ2 and acts as a tumor suppressor in vitro and in vivo. Silencing of microRNA 152 in different cancers and the ensuing down-regulation of its host gene COPZ2 offer a therapeutic opportunity for proliferation-independent selective killing of tumor cells by COPZ1-targeting agents.
Asunto(s)
Proteína Coatómero/genética , Neoplasias/genética , Apoptosis/genética , Autofagia/genética , Secuencia de Bases , Línea Celular Tumoral , ADN de Neoplasias/genética , Femenino , Técnicas de Silenciamiento del Gen , Silenciador del Gen , Aparato de Golgi/genética , Aparato de Golgi/patología , Humanos , Masculino , MicroARNs/genética , Neoplasias/patología , Isoformas de Proteínas/genética , ARN Neoplásico/genética , ARN Interferente Pequeño/genética , Supresión GenéticaRESUMEN
Leaf-cutter ants are one of the most important herbivorous insects in the Neotropics, harvesting vast quantities of fresh leaf material. The ants use leaves to cultivate a fungus that serves as the colony's primary food source. This obligate ant-fungus mutualism is one of the few occurrences of farming by non-humans and likely facilitated the formation of their massive colonies. Mature leaf-cutter ant colonies contain millions of workers ranging in size from small garden tenders to large soldiers, resulting in one of the most complex polymorphic caste systems within ants. To begin uncovering the genomic underpinnings of this system, we sequenced the genome of Atta cephalotes using 454 pyrosequencing. One prediction from this ant's lifestyle is that it has undergone genetic modifications that reflect its obligate dependence on the fungus for nutrients. Analysis of this genome sequence is consistent with this hypothesis, as we find evidence for reductions in genes related to nutrient acquisition. These include extensive reductions in serine proteases (which are likely unnecessary because proteolysis is not a primary mechanism used to process nutrients obtained from the fungus), a loss of genes involved in arginine biosynthesis (suggesting that this amino acid is obtained from the fungus), and the absence of a hexamerin (which sequesters amino acids during larval development in other insects). Following recent reports of genome sequences from other insects that engage in symbioses with beneficial microbes, the A. cephalotes genome provides new insights into the symbiotic lifestyle of this ant and advances our understanding of host-microbe symbioses.
Asunto(s)
Hormigas/fisiología , Genoma de los Insectos/genética , Hojas de la Planta/fisiología , Simbiosis , Animales , Hormigas/genética , Arginina/genética , Arginina/metabolismo , Secuencia de Bases , Hongos/genética , Proteínas de Insectos/genética , Proteínas de Insectos/metabolismo , Análisis de Secuencia de ADN , Serina Proteasas/genética , Serina Proteasas/metabolismoRESUMEN
Metastatic castration-resistant prostate cancer (mCRPC) is a lethal disease, and molecular markers that differentiate indolent from aggressive subtypes are needed. We sequenced the exomes of five metastatic tumors and healthy kidney tissue from an index case with mCRPC to identify lesions associated with disease progression and metastasis. An Ashkenazi Jewish (AJ) germline founder mutation, del185AG in BRCA1, was observed and AJ ancestry was confirmed. Sixty-two somatic variants altered proteins in tumors, including cancer-associated genes, TMPRSS2-ERG, PBRM1, and TET2. The majority (n = 53) of somatic variants were present in all metastases and only a subset (n = 31) was observed in the primary tumor. Integrating tumor next-generation sequencing and DNA copy number showed somatic loss of BRCA1 and TMPRSS2-ERG. We sequenced 19 genes with deleterious mutations in the index case in additional mCRPC samples and detected a frameshift, two somatic missense alterations, tumor loss of heterozygosity, and combinations of germline missense SNPs in TET2. In summary, genetic analysis of metastases from an index case permitted us to infer a chronology for the clonal spread of disease based on sequential accrual of somatic lesions. The role of TET2 in mCRPC deserves additional analysis and may define a subset of metastatic disease.