RESUMO
Horizontal gene transfer (HGT), or the transfer of genes between species, has been recognized recently as more pervasive than previously suspected. Here, we report evidence for an unprecedented degree of HGT into an animal genome, based on a draft genome of a tardigrade, Hypsibius dujardini. Tardigrades are microscopic eight-legged animals that are famous for their ability to survive extreme conditions. Genome sequencing, direct confirmation of physical linkage, and phylogenetic analysis revealed that a large fraction of the H. dujardini genome is derived from diverse bacteria as well as plants, fungi, and Archaea. We estimate that approximately one-sixth of tardigrade genes entered by HGT, nearly double the fraction found in the most extreme cases of HGT into animals known to date. Foreign genes have supplemented, expanded, and even replaced some metazoan gene families within the tardigrade genome. Our results demonstrate that an unexpectedly large fraction of an animal genome can be derived from foreign sources. We speculate that animals that can survive extremes may be particularly prone to acquiring foreign genes.
Assuntos
Transferência Genética Horizontal , Genoma/genética , Biblioteca Genômica , Análise de Sequência de DNA/métodos , Tardígrados/genética , Animais , DNA Arqueal/química , DNA Arqueal/genética , DNA Bacteriano/química , DNA Bacteriano/genética , DNA Fúngico/química , DNA Fúngico/genética , DNA de Plantas/química , DNA de Plantas/genética , DNA Viral/química , DNA Viral/genética , Filogenia , Tardígrados/classificaçãoRESUMO
Using comparative sequencing approaches, we investigated the evolutionary history of the European-enriched 17q21.31 MAPT inversion polymorphism. We present a detailed, BAC-based sequence assembly of the inverted human H2 haplotype and compare it to the sequence structure and genetic variation of the corresponding 1.5-Mb region for the noninverted H1 human haplotype and that of chimpanzee and orangutan. We found that inversion of the MAPT region is similarly polymorphic in other great ape species, and we present evidence that the inversions occurred independently in chimpanzees and humans. In humans, the inversion breakpoints correspond to core duplications with the LRRC37 gene family. Our analysis favors the H2 configuration and sequence haplotype as the likely great ape and human ancestral state, with inversion recurrences during primate evolution. We show that the H2 architecture has evolved more extensive sequence homology, perhaps explaining its tendency to undergo microdeletion associated with mental retardation in European populations.
Assuntos
Inversão Cromossômica , Cromossomos Humanos Par 17 , Evolução Molecular , Polimorfismo Genético , Proteínas tau/genética , Animais , Sequência de Bases , Duplicação Gênica , Humanos , Modelos Biológicos , Dados de Sequência Molecular , Pan troglodytes/genética , Filogenia , Pongo pygmaeus/genética , Análise de Sequência de DNARESUMO
BACKGROUND: Thymomas are one of the most rarely diagnosed malignancies. To better understand its biology and to identify therapeutic targets, we performed next-generation RNA sequencing. METHODS: The RNA was sequenced from 13 thymic malignancies and 3 normal thymus glands. Validation of microRNA expression was performed on a separate set of 35 thymic malignancies. For cell-based studies, a thymoma cell line was used. RESULTS: Hierarchical clustering revealed 100% concordance between gene expression clusters and WHO subtype. A substantial differentiator was a large microRNA cluster on chr19q13.42 that was significantly overexpressed in all A and AB tumours and whose expression was virtually absent in the other thymomas and normal tissues. Overexpression of this microRNA cluster activates the PI3K/AKT/mTOR pathway. Treatment of a thymoma AB cell line with a panel of PI3K/AKT/mTOR inhibitors resulted in marked reduction of cell viability. CONCLUSIONS: A large microRNA cluster on chr19q13.42 is a transcriptional hallmark of type A and AB thymomas. Furthermore, this cluster activates the PI3K pathway, suggesting the possible exploration of PI3K inhibitors in patients with these subtypes of tumour. This work has led to the initiation of a phase II clinical trial of PI3K inhibition in relapsed or refractory thymomas (http://clinicaltrials.gov/ct2/show/NCT02220855).
Assuntos
Cromossomos Humanos Par 19 , MicroRNAs/genética , Timoma/genética , Neoplasias do Timo/genética , Humanos , Timoma/classificaçãoRESUMO
INTRODUCTION: Our efforts to prevent and treat breast cancer are significantly impeded by a lack of knowledge of the biology and developmental genetics of the normal mammary gland. In order to provide the specimens that will facilitate such an understanding, The Susan G. Komen for the Cure Tissue Bank at the IU Simon Cancer Center (KTB) was established. The KTB is, to our knowledge, the only biorepository in the world prospectively established to collect normal, healthy breast tissue from volunteer donors. As a first initiative toward a molecular understanding of the biology and developmental genetics of the normal mammary gland, the effect of the menstrual cycle and hormonal contraceptives on DNA expression in the normal breast epithelium was examined. METHODS: Using normal breast tissue from 20 premenopausal donors to KTB, the changes in the mRNA of the normal breast epithelium as a function of phase of the menstrual cycle and hormonal contraception were assayed using next-generation whole transcriptome sequencing (RNA-Seq). RESULTS: In total, 255 genes representing 1.4% of all genes were deemed to have statistically significant differential expression between the two phases of the menstrual cycle. The overwhelming majority (221; 87%) of the genes have higher expression during the luteal phase. These data provide important insights into the processes occurring during each phase of the menstrual cycle. There was only a single gene significantly differentially expressed when comparing the epithelium of women using hormonal contraception to those in the luteal phase. CONCLUSIONS: We have taken advantage of a unique research resource, the KTB, to complete the first-ever next-generation transcriptome sequencing of the epithelial compartment of 20 normal human breast specimens. This work has produced a comprehensive catalog of the differences in the expression of protein-coding genes as a function of the phase of the menstrual cycle. These data constitute the beginning of a reference data set of the normal mammary gland, which can be consulted for comparison with data developed from malignant specimens, or to mine the effects of the hormonal flux that occurs during the menstrual cycle.
Assuntos
Mama/metabolismo , Epitélio/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Pré-Menopausa/genética , Bancos de Tecidos , Transcriptoma/genética , Adulto , Algoritmos , Feminino , Fase Folicular/genética , Redes Reguladoras de Genes , Humanos , Modelos Lineares , Fase Luteal/genética , Pessoa de Meia-Idade , Reação em Cadeia da Polimerase Via Transcriptase ReversaRESUMO
Triple-negative breast cancers (TNBCs) are a heterogeneous set of tumors defined by an absence of actionable therapeutic targets (ER, PR, and HER-2). Microdissected normal ductal epithelium from healthy volunteers represents a novel comparator to reveal insights into TNBC heterogeneity and to inform drug development. Using RNA-sequencing data from our institution and The Cancer Genome Atlas (TCGA) we compared the transcriptomes of 94 TNBCs, 20 microdissected normal breast tissues from healthy volunteers from the Susan G. Komen for the Cure Tissue Bank, and 10 histologically normal tissues adjacent to tumor. Pathway analysis comparing TNBCs to optimized normal controls of microdissected normal epithelium versus classic controls composed of adjacent normal tissue revealed distinct molecular signatures. Differential gene expression of TNBC compared with normal comparators demonstrated important findings for TNBC-specific clinical trials testing targeted agents; lack of over-expression for negative studies and over-expression in studies with drug activity. Next, by comparing each individual TNBC to the set of microdissected normals, we demonstrate that TNBC heterogeneity is attributable to transcriptional chaos, is associated with non-silent DNA mutational load, and explains transcriptional heterogeneity in addition to known molecular subtypes. Finally, chaos analysis identified 146 core genes dysregulated in >90 % of TNBCs revealing an over-expressed central network. In conclusion, use of microdissected normal ductal epithelium from healthy volunteers enables an optimized approach for studying TNBC and uncovers biological heterogeneity mediated by transcriptional chaos.
Assuntos
Neoplasias de Mama Triplo Negativas/genética , Neoplasias de Mama Triplo Negativas/patologia , Estudos de Casos e Controles , Análise por Conglomerados , Feminino , Proteína Forkhead Box M1 , Fatores de Transcrição Forkhead/metabolismo , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Glândulas Mamárias Humanas/metabolismo , Microdissecção , Mutação , Análise de Sequência de RNA , Transcrição Gênica , Neoplasias de Mama Triplo Negativas/tratamento farmacológicoRESUMO
Acute myeloid leukaemia is a highly malignant haematopoietic tumour that affects about 13,000 adults in the United States each year. The treatment of this disease has changed little in the past two decades, because most of the genetic events that initiate the disease remain undiscovered. Whole-genome sequencing is now possible at a reasonable cost and timeframe to use this approach for the unbiased discovery of tumour-specific somatic mutations that alter the protein-coding genes. Here we present the results obtained from sequencing a typical acute myeloid leukaemia genome, and its matched normal counterpart obtained from the same patient's skin. We discovered ten genes with acquired mutations; two were previously described mutations that are thought to contribute to tumour progression, and eight were new mutations present in virtually all tumour cells at presentation and relapse, the function of which is not yet known. Our study establishes whole-genome sequencing as an unbiased method for discovering cancer-initiating mutations in previously unidentified genes that may respond to targeted therapies.
Assuntos
Regulação Neoplásica da Expressão Gênica/genética , Genoma Humano/genética , Leucemia Mieloide Aguda/genética , Estudos de Casos e Controles , Progressão da Doença , Perfilação da Expressão Gênica , Genômica , Humanos , Mutagênese Insercional , Mutação , Polimorfismo de Nucleotídeo Único , Recidiva , Análise de Sequência de DNA , Deleção de Sequência , Pele/metabolismoRESUMO
PURPOSE: Anti-PD-1 therapy provides clinical benefit in 40-50% of patients with relapsed and/or metastatic head and neck squamous cell carcinoma (RM-HNSCC). Selection of anti- PD-1 therapy is typically based on patient PD-L1 immunohistochemistry (IHC) which has low specificity for predicting disease control. Therefore, there is a critical need for a clinical biomarker that will predict clinical benefit to anti-PD-1 treatment with high specificity. METHODS: Clinical treatment and outcomes data for 103 RM-HNSCC patients were paired with RNA-sequencing data from formalin-fixed patient samples. Using logistic regression methods, we developed a novel biomarker classifier based on expression patterns in the tumor immune microenvironment to predict disease control with monotherapy PD-1 inhibitors (pembrolizumab and nivolumab). The performance of the biomarker was internally validated using out-of-bag methods. RESULTS: The biomarker significantly predicted disease control (65% in predicted non-progressors vs. 17% in predicted progressors, p < 0.001) and was significantly correlated with overall survival (OS; p = 0.004). In addition, the biomarker outperformed PD-L1 IHC across numerous metrics including sensitivity (0.79 vs 0.64, respectively; p = 0.005) and specificity (0.70 vs 0.61, respectively; p = 0.009). CONCLUSION: This novel assay uses tumor immune microenvironment expression data to predict disease control and OS with high sensitivity and specificity in patients with RM-HNSCC treated with anti-PD-1 monotherapy.
RESUMO
Anti-PD-1 therapy can provide long, durable benefit to a fraction of patients. The on-label PD-L1 test, however, does not accurately predict response. To build a better biomarker, we created a method called T Cell Subtype Profiling (TCSP) that characterizes the abundance of T cell subtypes (TCSs) in FFPE specimens using five RNA models. These TCS RNA models are created using functional methods, and robustly discriminate between naïve, activated, exhausted, effector memory, and central memory TCSs, without the reliance on non-specific, classical markers. TCSP is analytically valid and corroborates associations between TCSs and clinical outcomes. Multianalyte biomarkers based on TCS estimates predicted response to anti-PD-1 therapy in three different cancers and outperformed the indicated PD-L1 test, as well as Tumor Mutational Burden. Given the utility of TCSP, we investigated the abundance of TCSs in TCGA cancers and created a portal to enable researchers to discover other TCSP-based biomarkers.
Assuntos
Linfócitos T CD8-Positivos/metabolismo , Neoplasias/tratamento farmacológico , Receptor de Morte Celular Programada 1/metabolismo , Biomarcadores Tumorais/metabolismo , Linfócitos T CD8-Positivos/patologia , Células Cultivadas , Humanos , Leucócitos MononuclearesRESUMO
Massively parallel sequencing instruments enable rapid and inexpensive DNA sequence data production. Because these instruments are new, their data require characterization with respect to accuracy and utility. To address this, we sequenced a Caernohabditis elegans N2 Bristol strain isolate using the Solexa Sequence Analyzer, and compared the reads to the reference genome to characterize the data and to evaluate coverage and representation. Massively parallel sequencing facilitates strain-to-reference comparison for genome-wide sequence variant discovery. Owing to the short-read-length sequences produced, we developed a revised approach to determine the regions of the genome to which short reads could be uniquely mapped. We then aligned Solexa reads from C. elegans strain CB4858 to the reference, and screened for single-nucleotide polymorphisms (SNPs) and small indels. This study demonstrates the utility of massively parallel short read sequencing for whole genome resequencing and for accurate discovery of genome-wide polymorphisms.
Assuntos
Caenorhabditis elegans/genética , Mapeamento Cromossômico/métodos , Análise Mutacional de DNA/métodos , Variação Genética/genética , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA/métodos , Animais , Sequência de Bases , Dados de Sequência MolecularRESUMO
Gallus GBrowse (http://birdbase.net/cgi-bin/gbrowse/gallus/) provides online access to genomic and other information about the chicken, Gallus gallus. The information provided by this resource includes predicted genes and Gene Ontology (GO) terms, links to Gallus In Situ Hybridization Analysis (GEISHA), Unigene and Reactome, the genomic positions of chicken genetic markers, SNPs and microarray probes, and mappings from turkey, condor and zebra finch DNA and EST sequences to the chicken genome. We also provide a BLAT server (http://birdbase.net/cgi-bin/webBlat) for matching user-provided sequences to the chicken genome. These tools make the Gallus GBrowse server a valuable resource for researchers seeking genomic information regarding the chicken and other avian species.
Assuntos
Galinhas/genética , Bases de Dados Genéticas , Genômica , Animais , Galinhas/metabolismo , Mapeamento Cromossômico , Perfilação da Expressão Gênica , Marcadores Genéticos , Internet , Polimorfismo de Nucleotídeo Único , Software , Interface Usuário-ComputadorRESUMO
As immuno-oncology drugs grow more popular in the treatment of cancer, better methods are needed to quantify the tumor immune cell component to determine which patients are most likely to benefit from treatment. Methods such as flow cytometry can accurately assess the composition of infiltrating immune cells; however, they show limited use in formalin-fixed, paraffin-embedded (FFPE) specimens. This article describes a novel hybrid-capture RNA sequencing assay, ImmunoPrism, that estimates the relative percentage abundance of eight immune cell types in FFPE solid tumors. Immune health expression models were generated using machine learning methods and used to uniquely identify each immune cell type using the most discriminatively expressed genes. The analytical performance of the assay was assessed using 101 libraries from 40 FFPE and 32 fresh-frozen samples. With defined samples, ImmunoPrism had a precision of ±2.72%, a total error of 2.75%, and a strong correlation (r2 = 0.81; P < 0.001) to flow cytometry. ImmunoPrism had similar performance in dissociated tumor cell samples (total error of 8.12%) and correlated strongly with immunohistochemistry (CD8: r2 = 0.83; P < 0.001) in FFPE samples. Other performance metrics were determined, including limit of detection, reportable range, and reproducibility. The approach used for analytical validation is shared here so that it may serve as a helpful framework for other laboratories when validating future complex RNA-based assays.
Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Imunomodulação/genética , Neoplasias/genética , Neoplasias/imunologia , Biologia Computacional/normas , Perfilação da Expressão Gênica/normas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Imuno-Histoquímica , Leucócitos Mononucleares/imunologia , Leucócitos Mononucleares/metabolismo , Linfócitos/imunologia , Linfócitos/metabolismo , Neoplasias/metabolismo , Neoplasias/patologia , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise de Sequência de RNARESUMO
BACKGROUND: Genomic studies in non-domestic avian models, such as the California condor and white-throated sparrow, can lead to more comprehensive conservation plans and provide clues for understanding mechanisms affecting genetic variation, adaptation and evolution.Developing genomic tools and resources including genomic libraries and a genetic map of the California condor is a prerequisite for identification of candidate loci for a heritable embryonic lethal condition. The white-throated sparrow exhibits a stable genetic polymorphism (i.e. chromosomal rearrangements) associated with variation in morphology, physiology, and behavior (e.g., aggression, social behavior, sexual behavior, parental care).In this paper we outline the utility of these species as well as report on recent advances in the study of their genomes. RESULTS: Genotyping of the condor resource population at 17 microsatellite loci provided a better assessment of the current population's genetic variation. Specific New World vulture repeats were found in the condor genome. Using condor BAC library and clones, chicken-condor comparative maps were generated. A condor fibroblast cell line transcriptome was characterized using the 454 sequencing technology.Our karyotypic analyses of the sparrow in combination with other studies indicate that the rearrangements in both chromosomes 2m and 3a are complex and likely involve multiple inversions, interchromosomal linkage, and pleiotropy. At least a portion of the rearrangement in chromosome 2m existed in the common ancestor of the four North American species of Zonotrichia, but not in the one South American species, and that the 2m form, originally thought to be the derived condition, might actually be the ancestral one. CONCLUSION: Mining and characterization of candidate loci in the California condor using molecular genetic and genomic techniques as well as linkage and comparative genomic mapping will eventually enable the identification of carriers of the chondrodystrophy allele, resulting in improved genetic management of this disease.In the white-throated sparrow, genomic studies, combined with ecological data, will help elucidate the basis of genic selection in a natural population. Morphs of the sparrow provide us with a unique opportunity to study intraspecific genomic differences, which have resulted from two separate yet linked evolutionary trajectories. Such results can transform our understanding of evolutionary and conservation biology.
Assuntos
Conservação dos Recursos Naturais , Genômica , Aves Predatórias/genética , Pardais/genética , Animais , Cromossomos Artificiais Bacterianos , Evolução Molecular , Feminino , Biblioteca Gênica , Ligação Genética , Variação Genética , Genética Populacional , Cariotipagem , Repetições de Microssatélites , Análise de Sequência de DNARESUMO
Leaf rust, caused by Puccinia triticina Eriks., is one of the most widespread diseases of wheat and breeding for resistance is one of the most effective methods of control. Lr16 is a wheat leaf rust resistance gene (R-gene) that provides resistance at both the seedling and adult stages. Simple-sequence repeat (SSR) markers have been used to map Lr16 to the distal end of chromosome 2B. The objectives of this study were to use RNA sequencing (RNA-seq) and in silico subtraction to identify new R-gene analogs (RGAs) and use them as Lr16 markers. RNA was isolated from the susceptible wheat cultivar Thatcher (Tc) and the resistant Tc isolines TcLr10, TcLr16, TcLr21, and sequenced using Illumina technology. Using in silico subtraction, sequences from the resistant Tc isolines were aligned to a Tc reference expressed sequence tag (EST) set. Sequences not aligning to the Tc reference were assembled into contigs and analyzed using BLASTx to determine putative gene functions. Primer pairs were designed for 181 RGA sequences, of which, 137 amplified in at least one of the parents. A mapping population was developed with 165 F2 lines from a cross between the rust-susceptible cultivar Chinese Spring (CS) and TcLr16. Two RGA markers XTaLr16_RGA266585 and XTaLr16_RGA22128 were identified that mapped proximally 1.2 and 23.8 cM from Lr16, respectively. Three SSR markers Xwmc764, Xwmc661, and Xbarc35 mapped between these two RGA markers at distances of 5.0, 10.9, and 16.1 cM from Lr16, respectively. In silico subtraction is an effective technique for isolating RGAs linked to R-genes of interest.
RESUMO
The recent discovery of bovine haplotypes with negative effects on fertility in the Brown Swiss, Holstein, and Jersey breeds has allowed producers to identify carrier animals using commercial single nucleotide polymorphism (SNP) genotyping assays. This study was devised to identify the causative mutations underlying defective bovine embryo development contained within three of these haplotypes (Brown Swiss haplotype 1 and Holstein haplotypes 2 and 3) by combining exome capture with next generation sequencing. Of the 68,476,640 sequence variations (SV) identified, only 1,311 genome-wide SNP were concordant with the haplotype status of 21 sequenced carriers. Validation genotyping of 36 candidate SNP identified only 1 variant that was concordant to Holstein haplotype 3 (HH3), while no variants located within the refined intervals for HH2 or BH1 were concordant. The variant strictly associated with HH3 is a non-synonymous SNP (T/C) within exon 24 of the Structural Maintenance of Chromosomes 2 (SMC2) on Chromosome 8 at position 95,410,507 (UMD3.1). This polymorphism changes amino acid 1135 from phenylalanine to serine and causes a non-neutral, non-tolerated, and evolutionarily unlikely substitution within the NTPase domain of the encoded protein. Because only exome capture sequencing was used, we could not rule out the possibility that the true causative mutation for HH3 might lie in a non-exonic genomic location. Given the essential role of SMC2 in DNA repair, chromosome condensation and segregation during cell division, our findings strongly support the non-synonymous SNP (T/C) in SMC2 as the likely causative mutation. The absence of concordant variations for HH2 or BH1 suggests either the underlying causative mutations lie within a non-exomic region or in exome regions not covered by the capture array.
Assuntos
Bovinos/genética , Exoma , Haplótipos , Proteínas Nucleares/genética , Mutação Puntual , Polimorfismo de Nucleotídeo Único , Animais , Cromossomos de Mamíferos/genética , Análise Mutacional de DNA , Reparo do DNA/genética , Feminino , MasculinoRESUMO
We have sequenced 36,641 expressed sequence tags from laser capture microdissected adult mouse gastric and small intestinal epithelial progenitors, obtaining 4031 and 3324 unique transcripts, respectively. Using Gene Ontology (GO) terms, each data set was compared with cDNA libraries from intact adult stomach and small intestine. Genes in GO categories enriched in progenitors were filtered against genes in GO categories represented in hematopoietic, neural, and embryonic stem cell transcriptomes and mapped onto transcription factor networks, plus canonical signal transduction and metabolic pathways. Wnt/beta-catenin, phosphoinositide-3/Akt kinase, insulin-like growth factor-1, vascular endothelial growth factor, integrin, and gamma-aminobutyric acid receptor signaling cascades, plus glycerolipid, fatty acid, and amino acid metabolic pathways are among those prominently represented in adult gut progenitors. The results reveal shared as well as distinctive features of adult gut stem cells when compared with other stem cell populations.
Assuntos
Células Epiteliais/metabolismo , Epitélio/metabolismo , Mucosa Gástrica/metabolismo , Mucosa Intestinal/metabolismo , Animais , Biologia Computacional , DNA Complementar/metabolismo , Etiquetas de Sequências Expressas , Biblioteca Gênica , Genoma , Células-Tronco Hematopoéticas/metabolismo , Imuno-Histoquímica , Intestino Delgado/metabolismo , Lasers , Camundongos , Modelos Biológicos , Neurônios/metabolismo , RNA Mensageiro/metabolismo , Transdução de Sinais , Software , Células-Tronco/metabolismo , Fatores de Transcrição/metabolismoRESUMO
Transcription factors (TFs) are essential regulators of gene expression, and mutated TF genes have been shown to cause numerous human genetic diseases. Yet to date, no single, comprehensive database of human TFs exists. In this work, we describe the collection of an essentially complete set of TF genes from one depiction of the human ORFeome, and the design of a microarray to interrogate their expression. Taking 1468 known TFs from TRANSFAC, InterPro, and FlyBase, we used this seed set to search the ScriptSure human transcriptome database for additional genes. ScriptSure's genome-anchored transcript clusters allowed us to work with a nonredundant high-quality representation of the human transcriptome. We used a high-stringency similarity search by using BLASTN, and a protein motif search of the human ORFeome by using hidden Markov models of DNA-binding domains known to occur exclusively or primarily in TFs. Four hundred ninety-four additional TF genes were identified in the overlap between the two searches, bringing our estimate of the total number of human TFs to 1962. Zinc finger genes are by far the most abundant family (762 members), followed by homeobox (199 members) and basic helix-loop-helix genes (117 members). We designed a microarray of 50-mer oligonucleotide probes targeted to a unique region of the coding sequence of each gene. We have successfully used this microarray to interrogate TF gene expression in species as diverse as chickens and mice, as well as in humans.